{
"cells": [
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"%matplotlib inline"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"\n\n# Order-based algorithms for causal discovery from observational data without latent confounders\n\nWe will simulate some observational data from a Structural Causal Model (SCM) and\ndemonstrate how we will use the SCORE algorithm. The example can be easily adapted\nto CAM, DAS and NoGAM methods.\n\nCAM, SCORE, DAS, NoGAM algorithms perform causal discovery in a two steps procedure.\nGiven observational i.i.d. data from an Additive Noise Model without\nlatent confounders, first the method estimates a topological ordering of the causal variables.\nThis partial ordering can be represented as a fully connected graph, where every node has an\nincoming edge from all its predecessors in the ordering. Second, the resulting fully connected\nDAG is pruned by some variable selection procedure.\nNote that knowing the topological ordering is already sufficient for estimating causal effects.\nNevertheless, the pruning step is justified by the fact that operating with a sparser graph is\nstatistically more efficient.\n\nThe four methods differ as follow:\n\n* In CAM algorithm the topological ordering is inferred by finding the permutation of the graph\n nodes corresponding to the fully connected graph that maximizes the log-likelihood of the data.\n After inference of the topological ordering, the pruning step is done by variable selection with\n regression. In particular, for each variable ``j`` CAM fits a generalized additive model using as\n covariates all the predecessor of ``j`` in the ordering, and performs hypothesis testing to select\n relevant parent variables.\n* SCORE provides a more efficient topological ordering than CAM, while it inherits the pruning\n procedure. In order to infer the topological ordering, SCORE estimates the Hessian matrix of the\n log-likelihood. Then, it finds a leaf (i.e. a node without children) by taking the ``argmin`` of\n the variance over the diagonal elements of the Hessian matrix. Once a leaf is found, it is\n removed from the graph and the procedure is iteratively repeated, evantually assigning a position\n to each node.\n* DAS provides a more efficient pruning step, while it inherits the ordering method from SCORE.\n Let ``H`` be the Hessian matrix of the log-likelihood: given a leaf node ``j``, DAS selects an\n edge ``i -> j`` if the pair satisfies ``mean(abs(H[i, j])) = 0``. Vanishing mean is verified by\n hypothesis testing. Finally, CAM-pruning is applied on the resulting sparse graph, in order to\n further reduce the number of false positives in the inferred DAG. Sparsity ensures linear\n computational complexity of this final pruning step. (DAS can be seen as an efficient version of\n SCORE, with better scaling properties in the graph size.)\n* NoGAM introduces a topological ordering procedure that does not assume any distribution\n of the noise terms, whereas CAM, SCORE and DAS all require the noise to be Gaussian.\n The pruning of the graph is done via CAM procedure. In order to define the topological order,\n NoGAM identifies one leaf at the time: first, for each node in the graph, it estimates the\n residuals of the regression problem that predicts a variable ``j`` from all the remaining nodes\n ``1, 2, .., j-1, j+1, .., |V|`` (with ``|V|`` the number of nodes). Then, NoGAM tries to estimate\n each entry ``j`` of the vector of the gradient of log-likelihood using the residual of the\n variable ``j`` as covariate: a leaf is found by selection of the ``argmin`` of the mean squared\n error of the predictions.\n\n\n.. currentmodule:: dodiscover\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Authors: Francesco Montagna \n\nLicense: BSD (3-clause)\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"import numpy as np\nimport networkx as nx\nfrom scipy import stats\nimport pandas as pd\nfrom pywhy_graphs.viz import draw\nfrom dodiscover import make_context\nfrom dodiscover.toporder.score import SCORE\nfrom dowhy import gcm\nfrom dowhy.gcm.util.general import set_random_seed"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Simulate some data\nFirst we will simulate data, starting from an Additive Noise Model (ANM).\nThis will then induce a causal graph, which we can visualize.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"# set a random seed to make example reproducible\nseed = 12345\nrng = np.random.RandomState(seed=seed)\n\n\nclass MyCustomModel(gcm.PredictionModel):\n def __init__(self, coefficient):\n self.coefficient = coefficient\n\n def fit(self, X, Y):\n # Nothing to fit here, since we know the ground truth.\n pass\n\n def predict(self, X):\n return self.coefficient * X\n\n def clone(self):\n # We don't really need this actually.\n return MyCustomModel(self.coefficient)\n\n\n# set a random seed to make example reproducible\nset_random_seed(1234)\n\n# construct a causal graph that will result in\n# x -> y <- z -> w\nG = nx.DiGraph([(\"x\", \"y\"), (\"z\", \"y\"), (\"z\", \"w\")])\n\ncausal_model = gcm.ProbabilisticCausalModel(G)\ncausal_model.set_causal_mechanism(\"x\", gcm.ScipyDistribution(stats.binom, p=0.5, n=1))\ncausal_model.set_causal_mechanism(\"z\", gcm.ScipyDistribution(stats.binom, p=0.9, n=1))\ncausal_model.set_causal_mechanism(\n \"y\",\n gcm.AdditiveNoiseModel(\n prediction_model=MyCustomModel(1),\n noise_model=gcm.ScipyDistribution(stats.binom, p=0.8, n=1),\n ),\n)\ncausal_model.set_causal_mechanism(\n \"w\",\n gcm.AdditiveNoiseModel(\n prediction_model=MyCustomModel(1),\n noise_model=gcm.ScipyDistribution(stats.binom, p=0.5, n=1),\n ),\n)\n\n# Fit here would not really fit parameters, since we don't do anything in the fit method.\n# Here, we only need this to ensure that each FCM has the correct local hash (i.e., we\n# get an inconsistency error if we would modify the graph afterwards without updating\n# the FCMs). Having an empty data set is a small workaround, since all models are\n# pre-defined.\ngcm.fit(causal_model, pd.DataFrame(columns=[\"x\", \"y\", \"z\", \"w\"]))\n\n# sample the observational data\ndata = gcm.draw_samples(causal_model, num_samples=500)\n\nprint(data.head())\nprint(pd.Series({col: data[col].unique() for col in data}))\ndot_graph = draw(G)\ndot_graph.render(outfile=\"oracle_dag.png\", view=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Define the context\nIn PyWhy, we introduce the :class:`context.Context` class, which should be a departure from\n\"data-first causal discovery,\" where users provide data as the primary input to a\ndiscovery algorithm. This problem with this approach is that it encourages novice\nusers to see the algorithm as a philosopher's stone that converts data to causal\nrelationships. With this mindset, users tend to surrender the task of providing\ndomain-specific assumptions that enable identifiability to the algorithm. In\ncontrast, PyWhy's key strength is how it guides users to specifying domain\nassumptions up front (in the form of a DAG) before the data is added, and\nthen addresses identifiability given those assumptions and data. In this sense,\nthe Context class houses both data, apriori assumptions and other relevant data\nthat may be used in downstream structure learning algorithms.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"context = make_context().variables(data=data).build()\n\n# Alternatively, one could specify some fixed edges.\n\n# .. code-block::Python\n# included_edges = nx.Graph([('x', 'y')])\n# context = make_context().edges(include=included_edges).build()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Run structure learning algorithm\nNow we are ready to run the SCORE algorithm. The method performs inference\nin two phases. First it estimates the topological order of the nodes in the\ngraphs. This is done iteratively according to the following procedure:\n\n1. SCORE estimates the Hessian of the logarithm of $p(V)$,\n with $p(V)$ the joint distribution of the nodes in the graph.\n2. Let `H := Hessian(log p(V))`. SCORE selects a leaf in the graph by finding\n the diagonal term of H with minimum variance,\n i.e. by computing `np.argmin(np.var(np.diag(H))`.\n3. SCORE removes the leaf from the graph, and repeats steps from 1. to 3.\n iteratively up to the source nodes.\n\nGiven the inferred topological order, SCORE prunes the graph with all the\nedges admitted by such ordering, by doing sparse regression to choose the\nrelevant variables. Variable selection is done by thresholding on the\np-values of the coefficients associated to the potential parents of a node.\nFor instance, consider a graph $G$ with 3 vertices $V = \\{1, 2, 3\\}$. For\nsimplicity let the topological order be trivial, i.e. $\\{1, 2, 3\\}$. The unique\nfully connected adjacency matrix compatible with such ordering is the upper\ntriangular matrix `np.triu(np.ones((3, 3)), k=1)` with all ones above the\ndiagonal.\n\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"collapsed": false
},
"outputs": [],
"source": [
"score = SCORE() # or DAS() or NoGAM() or CAM()\nscore.learn_graph(data, context)\n\n# SCORE estimates a directed acyclic graph (DAG) and the topoological order\n# of the nodes in the graph. SCORE is consistent in the infinite samples\n# limit, meaning that it might return faulty estimates due to the finiteness\n# of the data.\ngraph = score.graph_\norder_graph = score.order_graph_\n\n# `score_full_dag.png` visualizes the fully connected DAG representation of\n# the inferred topological ordering.\n# `score_dag.png` visualizes the fully connected DAG after pruning with\n# sparse regression.\ndot_graph = draw(graph, name=\"DAG after pruning\")\ndot_graph.render(outfile=\"score_dag.png\", view=True)\n\ndot_graph = draw(order_graph, name=\"Fully connected DAG\")\ndot_graph.render(outfile=\"score_full_dag.png\", view=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Summary\nWe observe two DAGs output of the SCORE inference procedure.\nOne is the fully connected graph associated to the inferred topological order\n`[z, x, y, w]` of the graph nodes. The other is the sparser graph after the pruning\nstep, corresponding to the causal graph inferred by SCORE.\n\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.9.19"
}
},
"nbformat": 4,
"nbformat_minor": 0
}