Getting Started#

Installation#

The simplest installation is through pip or conda:

PIP

pip install dowhy

CONDA

conda install -c conda-forge dowhy

Further installation scenarios and instructions can be found at Installation.

“Hello causal inference world”#

In this section, we will show the “Hello world” version of DoWhy. DoWhy is based on a simple unifying language for causal inference, unifying two powerful frameworks, namely graphical causal models (GCM) and potential outcomes (PO). It uses graph-based criteria and do-calculus for modeling assumptions and identifying a non-parametric causal effect.

To get you started, we introduce two features out of a large variety of features DoWhy offers.

Effect inference#

For effect estimation, DoWhy switches to methods based primarily on potential outcomes. To do it, DoWhy offers a simple 4-step recipe consisting of modeling a causal model, identification, estimation, and refutation:

from dowhy import CausalModel
import dowhy.datasets

# Generate some sample data
data = dowhy.datasets.linear_dataset(
    beta=10,
    num_common_causes=5,
    num_instruments=2,
    num_samples=10000)

# Step 1: Create a causal model from the data and given graph.
model = CausalModel(
    data=data["df"],
    treatment=data["treatment_name"],
    outcome=data["outcome_name"],
    graph=data["gml_graph"])

# Step 2: Identify causal effect and return target estimands
identified_estimand = model.identify_effect()

# Step 3: Estimate the target estimand using a statistical method.
estimate = model.estimate_effect(identified_estimand,
                                 method_name="backdoor.propensity_score_matching")

# Step 4: Refute the obtained estimate using multiple robustness checks.
refute_results = model.refute_estimate(identified_estimand, estimate,
                                       method_name="random_common_cause")

To understand what these four steps mean (and why we need four steps), the best place to learn more is the user guide’s Estimating Causal Effects chapter. Alternatively, you can dive into the code and explore basic features in Basic Example for Calculating the Causal Effect.

For estimation of conditional effects, you can also use methods from EconML using the same API, refer to Conditional Average Treatment Effects (CATE) with DoWhy and EconML.

Graphical causal model-based inference#

For features like root cause analysis, point-wise counterfactual inference, structural analysis and similar, DoWhy uses graphical causal models. The language of graphical causal models again offers a variety of causal questions that can be answered. DoWhy’s API to answer these causal questions follows a simple 3-step recipe as follows:

import networkx as nx, numpy as np, pandas as pd
from dowhy import gcm

# Let's generate some "normal" data we assume we're given from our problem domain:
X = np.random.normal(loc=0, scale=1, size=1000)
Y = 2 * X + np.random.normal(loc=0, scale=1, size=1000)
Z = 3 * Y + np.random.normal(loc=0, scale=1, size=1000)
data = pd.DataFrame(dict(X=X, Y=Y, Z=Z))

# Step 1: Model our system:
causal_model = gcm.StructuralCausalModel(nx.DiGraph([('X', 'Y'), ('Y', 'Z')]))
gcm.auto.assign_causal_mechanisms(causal_model, data)

# Step 2: Train our causal model with the data from above:
gcm.fit(causal_model, data)

# Step 3: Perform a causal analysis. For instance, root cause analysis, where we observe
anomalous_sample = pd.DataFrame(dict(X=[0.1], Y=[6.2], Z=[19]))  # Here, Y is the root cause.
# ... and would like to answer the question:
# "Which node is the root cause of the anomaly in Z?":
anomaly_attribution = gcm.attribute_anomalies(causal_model, "Z", anomalous_sample)

If you want to learn more about this and other GCM features, we recommend starting with Modeling Graphical Causal Models (GCMs) in the user guide or check out Basic Example for Graphical Causal Models.

Further resources#

There are further resources available:

An introductory tutorial on causal inference
A comprehensive tutorial on Causal Inference and Counterfactual Reasoning at the ACM Knowledge Discovery and Data Mining 2018 conference
A video introduction to the four steps of causal inference and its implications for machine learning from Microsoft Research: Foundations of causal inference and its impacts on machine learning
The PDF book Elements of Causal Inference
Draft chapters of an upcoming book: Causal reasoning: Fundamentals and machine learning applications
A blog post describing one of DoWhy’s root cause analysis algorithms via graphical causal models: New method identifies the root causes of statistical outliers