dodiscover.constraint.LearnSkeleton#

class dodiscover.constraint.LearnSkeleton(ci_estimator, sep_set=None, alpha=0.05, min_cond_set_size=0, max_cond_set_size=None, max_combinations=None, condsel_method=ConditioningSetSelection.NBRS, keep_sorted=False, n_jobs=None)[source]#

Learn a skeleton graph from observational data without latent confounding.

A skeleton graph from a Markovian causal model can be learned completely with this procedure.

Parameters:

ci_estimatorBaseConditionalIndependenceTest: The conditional independence test function.
sep_setdictionary of dictionary of list of set: Mapping node to other nodes to separating sets of variables. If None, then an empty dictionary of dictionary of list of sets will be initialized.
alphafloat, optional: The significance level for the conditional independence test, by default 0.05.
min_cond_set_sizeint: The minimum size of the conditioning set, by default 0. The number of variables used in the conditioning set.
max_cond_set_sizeint, optional: Maximum size of the conditioning set, by default None. Used to limit the computation spent on the algorithm.
max_combinationsint, optional: The maximum number of conditional independence tests to run from the set of possible conditioning sets. By default None, which means the algorithm will check all possible conditioning sets. If max_combinations=n is set, then for every conditioning set size, ‘p’, there will be at most ‘n’ CI tests run before the conditioning set size ‘p’ is incremented. For controlling the size of ‘p’, see min_cond_set_size and max_cond_set_size. This can be used in conjunction with keep_sorted parameter to only test the “strongest” dependences.
condsel_methodConditioningSetSelection: The method to use for selecting the conditioning set. Must be one of (‘complete’, ‘neighbors’, ‘neighbors_path’). See Notes for more details.
keep_sortedbool: Whether or not to keep the considered conditioning set variables in sorted dependency order. If True (default) will sort the existing dependencies of each variable by its dependencies from strongest to weakest (i.e. largest CI test statistic value to lowest). This can be used in conjunction with max_combinations parameter to only test the “strongest” dependences.
n_jobsint, optional: Number of CPUs to use, by default None.

Notes

Proceed by testing neighboring nodes, while keeping track of test statistic values (these are the ones that are the “most dependent”). Remember we are testing the null hypothesis

\[H_0: X \perp Y | Z\]

where the alternative hypothesis is that they are dependent and hence require a causal edge linking the two variables.

Different methods for learning the skeleton:

There are different ways to learn the skeleton that are valid under various assumptions. The value of condsel_method completely defines how one selects the conditioning set.

‘complete’: This exhaustively conditions on all combinations of variables in the graph. This essentially refers to the SGS algorithm [1]

‘neighbors’: This only conditions on adjacent variables to that of ‘x_var’ and ‘y_var’. This refers to the traditional PC algorithm [2]

‘neighbors_path’: This is ‘neighbors’, but restricts to variables with an adjacency path from ‘x_var’ to ‘y_var’. This is a variant from the RFCI paper [3]

Attributes:

adj_graph_nx.Graph: The discovered graph from data. Stored using an undirected graph. The graph contains edge attributes for the smallest value of the test statistic encountered (key name ‘test_stat’), the largest pvalue seen in testing ‘x’ || ‘y’ given some conditioning set (key name ‘pvalue’).
sep_set_dictionary of dictionary of list of set: Mapping node to other nodes to separating sets of variables.
context_Context: The result context. Encodes causal assumptions.
min_cond_set_size_int: The inferred minimum conditioning set size.
max_cond_set_size_int: The inferred maximum conditioning set size.
max_combinations_int: The inferred maximum number of combinations of ‘Z’ to test per \(X \perp Y | Z\).
n_iters_int: The number of iterations the skeleton has been learned.

Methods

evaluate_edge(data, conditional_test_func, X, Y)

Test any specific edge for X || Y | Z.

learn_graph

ci_estimator#: Callable[[Column, Column, Set[Column]], Tuple[float, float]]

evaluate_edge(data, conditional_test_func, X, Y, Z=None)#

Test any specific edge for X || Y | Z.

Parameters:

datapd.DataFrame: The dataset
Xcolumn: A column in data.
Ycolumn: A column in data.
Zset, optional: A list of columns in data, by default None.

Returns:

test_statfloat: Test statistic.
pvaluefloat: The pvalue.