2.1.2. dodiscover.ci.ClassifierCMITest#

class dodiscover.ci.ClassifierCMITest(clf, metric=<function f_divergence_score>, bootstrap=False, n_iter=20, threshold=0.03, test_size=0.3, n_jobs=-1, n_shuffle_nbrs=5, n_shuffle=100, eps=1e-08, random_seed=None)[source]#

Methods

generate_train_test_data(df, x_vars, y_vars)

Generate a training and testing dataset for CCIT.

test(df, x_vars, y_vars[, z_covariates])

Test conditional independence by estimating CMI.

generate_train_test_data(df, x_vars, y_vars, z_covariates=None, k=1)#

Generate a training and testing dataset for CCIT.

This takes a conditional independence problem given a dataset and converts it to a binary classification problem.

Parameters:
dfpd.DataFrame

The dataframe containing the dataset.

x_varsSet of column

A column in df.

y_varsSet of column

A column in df.

z_covariatesSet, optional

A set of columns in df, by default None. If None, then the test should run a standard independence test.

kint

The K nearest-neighbors in subspaces for the conditional permutation step to generate distribution with conditional independence. By default, 1.

Returns:
X_train, Y_train, X_test, Y_testTuple[array_like, array_like, array_like, array_like]

The X_train, y_train, X_test, y_test to be used in binary classification, where each dataset comprises of samples from the joint and conditionally independent distributions. y_train and y_test are comprised of 1’s and 0’s only. Indices with value 1 indicate the original joint distribution, and indices with value 0 indicate the shuffled distribution.

test(df, x_vars, y_vars, z_covariates=None)[source]#

Test conditional independence by estimating CMI.

Parameters:
dfpd.DataFrame

The dataframe containing the dataset.

x_varsSet of column

A column in df.

y_varsSet of column

A column in df.

z_covariatesSet, optional

A set of columns in df, by default None. If None, then the test should run a standard independence test.

Returns:
Tuple[float, float]

Test statistic and pvalue.