dodiscover.cd.BaseConditionalDiscrepancyTest#

class dodiscover.cd.BaseConditionalDiscrepancyTest[source]#

Abstract class for any conditional discrepancy test.

All CD tests are used in constraint-based causal discovery algorithms. This class interface is expected to be very lightweight to enable anyone to convert a function for CD testing into a class, which has a specific API.

Methods

`compute_null`(e_hat, X, Y[, null_reps, ...])	Estimate null distribution using propensity weights.
`test`(df, group_col, y_vars, x_vars)	Compute conditional discrepancy test.

compute_null(e_hat, X, Y, null_reps=1000, random_state=None)[source]#

Estimate null distribution using propensity weights.

Parameters:

e_hatArray-like of shape (n_samples,): The predicted propensity score for group_ind == 1.
XArray-Like of shape (n_samples, n_features_x): The X (covariates) array.
YArray-Like of shape (n_samples, n_features_y): The Y (outcomes) array.
null_repsint, optional: Number of times to sample null, by default 1000.
random_stateint, optional: Random generator, or random seed, by default None.

Returns:

null_distArray-like of shape (n_samples,): The null distribution of test statistics.

abstract test(df, group_col, y_vars, x_vars)[source]#

Compute conditional discrepancy test.

Tests the null hypothesis: \(P(Y | X, group) = P(Y | X)\), where we are trying to determine if Y is (conditionally) independent from the group denoting the distribution, given X.

Another way of viewing this test is testing whether or not \(P_i(Y|X) = P_j(Y|X)\), where \(P_i(.)\) and \(P_j(.)\) denote distributions from different groups or environments denoted by the group_col.

Parameters:

dfpd.DataFrame: The dataframe containing the dataset.
y_varsSet of column: A column in df.
group_colcolumn: A column in df that indicates which group of distribution each sample belongs to with a ‘0’, or ‘1’.
x_varsSet of column, optional: A column in df.

Returns:

Tuple[float, float]: Test statistic and pvalue.