1. Constraintbased causal discovery#
Testing for invariances in the data, such as conditional independence (CI) can be represented graphically in the form of dseparation statements under the causal faithfulness assumption. Given this, one is interested in highpowered and wellcontrolled CI tests that can be used to test for CI in data.
The following are a set of methods intended for (nonparametric) structure learning of causal graphs (i.e. causal discovery) given observational and/or interventional data by checking constraints in the form of conditional independences (CI). At a high level, all constraintbased causal discovery algorithms test CI statements, where we use different CI statistical tests to test the following null hypothesis:
\(H_0: X \perp Y  Z\) and \(H_A: X \not\perp Y  Z\)
For a given node X
and Y
in the underlying causal graph of interest, and
a conditioning set, Z
. CI tests can have a variety of different assumptions
that make one better than another in different settings and data assumptions.
For more information on the CI tests themselves, see Independence.
1.1. Fundamental Assumptions of ConstraintBased Causal Discovery#
The fundamental assumptions of all algorithms in this section are the Markov property assumption and the causal faithfulness assumption [1] and [2].
The Markov assumption states that any dseparation statement implies a CI statement. This is a coreassumption that users from graphical modeling may be familiar with. It is a general assumption that connects graphical models to probabilistic models.
On the other hand, the causal faithfulness assumption states that all CI statements in the data map to a dseparation statement. That is, there are no accidental CI that occur in the data, which are not represented by a dseparation statement in the underlying causal graph. The causal faithfulness assumption in theory is not an issue due to a theoretical result showing that violations of faithfulness in causal diagrams have measure zero (i.e. they do not occur).
However, in practice the causal faithfulness assumption is a very problematic assumption because one might have data that is very weakly dependent, such that a CI test under a specified \(\alpha\) level would fail to reject the null hypothesis and conclude the variables in question are CI. violations of “strongfaithfulness” occurs frequently and almost surely in higher dimensions [3].
Tackling violations of faithfulness in constraintbased causal discovery is a large and active area of research.
1.2. (Nonparametric) Markovian SCMs with Observational Data#
If one assumes that the underlying structural causal model (SCM) is Markovian, then the Peter and Clarke (PC) algorithm has been shown to be sound and complete for learning a completed partially directed acyclic graph (CPDAG) [4].
The dodiscover.constraint.PC
algorithm and its variants assume Markovianity, which is
also known as causalsufficiency in the literature. In other words, it assumes a lack of latent
confounders, where there is no latent variable that is a confounder of the observed data.
The PC algorithm learns a CPDAG in three stages:
 skeleton discovery: This first phase is the process of leveraging CI tests to test
edges for conditional independence. Along the way, connections of the graph are trimmed (when CI is detected) and the separating sets among pairs of variables are tracked.
A separating set is a set of nodes in the graph that dseparate a pair of variables. Note that a pair of variables may contain many dseparators, and thus there may be many separating sets.
 unshielded triplet orientation: This takes triplets on a path of the form
X ** Y ** Z
, where the triplet path is “unshielded” meaning
X
andZ
are not connected. Then it checks thatY
is not in the separating set of X and Z. Given these two conditions, Y must be a collider and is oriented asX *> Y <* Z
. The stars in the path indicate that it can be any kind of edge endpoint (e.g. in a PAG it could be a circle endpoint edge).
 unshielded triplet orientation: This takes triplets on a path of the form
 deterministic path orientations: Once all colliders are oriented, there are a set of
deterministic logical rules that allow us to orient more edges. In the PC algorithm, these are the socalled “Meek’s orientation rules”, which are 4 rules that are applied repeatedly until no more changes to the graph are made [4].
The resulting graph is an equivalence class of DAGs without latent confounders, the CPDAG.
For more information on CPDAGs, one can also see pywhy_graphs.CPDAG
.
1.3. (Nonparametric) SemiMarkovian SCMs with Observational Data#
If one assumes that the underlying SCM is SemiMarkovian, then the “Fast Causal Inference” (FCI) algorithm has been shown to be sound and complete for learning a partial ancestral graph (PAG) [5][6].
The FCI algorithm and its variants assume SemiMarkovianity, which assumes the possible presence of latent confounders and even selection bias in the observational data.
The dodiscover.constraint.FCI
algorithm follows the three stages of learning that the PC
algorithm does, but with a few minor modifications that we will outline here:
 skeleton discovery: The skeleton discovery phase is now composed of two stages. The first
stage is the same as the PC algorithm. The second phase, takes the output graph of the first phase and tries to orient colliders. This results in a PAG that can be queried for the potentially dseparating (PDS) sets for any pair of variables
(X, Y)
. The skeleton discovery phase is restarted from scratch, but now the conditioning sets are chosen from the PDS sets. The PDS set approach is described in [7] and [1].
 deterministic path orientations: The four orientation rules of the PC algorithm are still
the same, but in the FCI case, we add an additional six orientation rules. The additional rules account for latent confounding and selection bias. Three of those rules only apply if we assume selection bias is present.
1.4. (Nonparametric) SCMs with Interventional Data#
When we have access to experimental data, there are multiple datasets corresponding to multiple distributions (e.g. observational and different interventions), we can improve causal discovery. If one assumes we have access to multiple distributions, one may know the targets of each intervention, where one can apply the IFCI algorithm to learn an InterventionalPAG (IPAG) [8].
Alternatively, one may assume they do not know where the intervention was applied in each distribution. In this case, one may apply the \(Psi\)FCI algorithm to learn a \(Psi\)PAG [9].

Interventional (Psi) FCI algorithm. 
1.5. Choosing the conditioning sets#
We briefly describe how dodiscover
chooses conditioning sets, Z
that are tested given
a pair of nodes (X, Y)
. The test we are doing is \(X \perp Y  Z\), where Z
can
be the empty set. There are multiple strategies for choosing Z
.
Available methods for selecting the conditioning sets when learning a skeleton. 
1.6. Hyperparameters and controlling overfitting#
To describe.
1.7. Robust learning#
Conservative orientations, etc.