DoWhy: Interpreters for Causal Estimators#

This is a quick introduction to the use of interpreters in the DoWhy causal inference library. We will load in a sample dataset, use different methods for estimating the causal effect of a (pre-specified)treatment variable on a (pre-specified) outcome variable and demonstrate how to interpret the obtained results.

First, let us add the required path for Python to find the DoWhy code and load all required packages

[1]:

%load_ext autoreload
%autoreload 2

[2]:

import numpy as np
import pandas as pd
import logging

import dowhy
from dowhy import CausalModel
import dowhy.datasets

Now, let us load a dataset. For simplicity, we simulate a dataset with linear relationships between common causes and treatment, and common causes and outcome.

Beta is the true causal effect.

[3]:

data = dowhy.datasets.linear_dataset(beta=1,
        num_common_causes=5,
        num_instruments = 2,
        num_treatments=1,
        num_discrete_common_causes=1,
        num_samples=10000,
        treatment_is_binary=True,
        outcome_is_binary=False)
df = data["df"]
print(df[df.v0==True].shape[0])
df

[3]:

	Z0	Z1	W0	W1	W2	W3	W4	v0	y
0	1.0	0.329924	-0.562990	-1.152751	-1.242643	-0.036660	2	True	0.339609
1	0.0	0.049015	0.706824	-0.076929	0.012096	-1.159813	1	False	0.412772
2	1.0	0.831420	3.651132	-0.852035	-0.233533	-0.183871	0	True	1.141385
3	0.0	0.309006	-0.638091	-1.556755	-0.411045	-2.245108	0	False	-1.583674
4	1.0	0.558592	2.421711	1.254029	-0.275196	0.624431	3	True	3.556725
...	...	...	...	...	...	...	...	...	...
9995	0.0	0.273608	1.020261	1.202197	0.359198	-0.939972	1	True	2.535190
9996	0.0	0.515192	-1.180266	-0.213392	-1.473312	0.300800	0	True	-0.000094
9997	1.0	0.278973	0.379144	0.835538	-1.293364	-2.738617	2	True	1.712016
9998	0.0	0.446293	0.772531	-0.357786	-0.517827	2.257053	2	True	1.674568
9999	0.0	0.923014	1.150692	0.219845	-0.530762	-1.006868	2	True	1.925502

10000 rows × 9 columns

Note that we are using a pandas dataframe to load the data.

Identifying the causal estimand#

We now input a causal graph in the GML graph format.

[4]:

# With graph
model=CausalModel(
        data = df,
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        graph=data["gml_graph"],
        instruments=data["instrument_names"]
        )

[5]:

model.view_model()

../_images/example_notebooks_dowhy_interpreter_9_0.png

[6]:

from IPython.display import Image, display
display(Image(filename="causal_model.png"))

../_images/example_notebooks_dowhy_interpreter_10_0.png

We get a causal graph. Now identification and estimation is done.

[7]:

identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
print(identified_estimand)

Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W3,W2,W4,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W3,W2,W4,W0,U) = P(y|v0,W1,W3,W2,W4,W0)

### Estimand : 2
Estimand name: iv
Estimand expression:
 ⎡                              -1⎤
 ⎢    d        ⎛    d          ⎞  ⎥
E⎢─────────(y)⋅⎜─────────([v₀])⎟  ⎥
 ⎣d[Z₁  Z₀]    ⎝d[Z₁  Z₀]      ⎠  ⎦
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z1,Z0})
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→{v0}, then ¬({Z1,Z0}→y)

### Estimand : 3
Estimand name: frontdoor
No such variable(s) found!

Method 1: Propensity Score Stratification#

We will be using propensity scores to stratify units in the data.

[8]:

causal_estimate_strat = model.estimate_effect(identified_estimand,
                                              method_name="backdoor.propensity_score_stratification",
                                              target_units="att")
print(causal_estimate_strat)
print("Causal Estimate is " + str(causal_estimate_strat.value))

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W3,W2,W4,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W3,W2,W4,W0,U) = P(y|v0,W1,W3,W2,W4,W0)

## Realized estimand
b: y~v0+W1+W3+W2+W4+W0
Target units: att

## Estimate
Mean value: 0.990092471917651

Causal Estimate is 0.990092471917651

Textual Interpreter#

The textual Interpreter describes (in words) the effect of unit change in the treatment variable on the outcome variable.

[9]:

# Textual Interpreter
interpretation = causal_estimate_strat.interpret(method_name="textual_effect_interpreter")

Increasing the treatment variable(s) [v0] from 0 to 1 causes an increase of 0.990092471917651 in the expected value of the outcome [['y']], over the data distribution/population represented by the dataset.

Visual Interpreter#

The visual interpreter plots the change in the standardized mean difference (SMD) before and after Propensity Score based adjustment of the dataset. The formula for SMD is given below.

\(SMD = \frac{\bar X_{1} - \bar X_{2}}{\sqrt{(S_{1}^{2} + S_{2}^{2})/2}}\)

Here, \(\bar X_{1}\) and \(\bar X_{2}\) are the sample mean for the treated and control groups.

[10]:

# Visual Interpreter
interpretation = causal_estimate_strat.interpret(method_name="propensity_balance_interpreter")

/home/runner/work/dowhy/dowhy/dowhy/interpreters/propensity_balance_interpreter.py:43: FutureWarning: The provided callable <function mean at 0x7fbbccdfea60> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  mean_diff = df_long.groupby(self.estimate._treatment_name + ["common_cause_id", "strata"]).agg(
/home/runner/work/dowhy/dowhy/dowhy/interpreters/propensity_balance_interpreter.py:57: FutureWarning: The provided callable <function std at 0x7fbbccdfeb80> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  stddev_by_w_strata = df_long.groupby(["common_cause_id", "strata"]).agg(stddev=("W", np.std)).reset_index()
/home/runner/work/dowhy/dowhy/dowhy/interpreters/propensity_balance_interpreter.py:63: FutureWarning: The provided callable <function sum at 0x7fbbcce7aaf0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  mean_diff_strata.groupby("common_cause_id").agg(std_mean_diff=("scaled_mean", np.sum)).reset_index()
/home/runner/work/dowhy/dowhy/dowhy/interpreters/propensity_balance_interpreter.py:67: FutureWarning: The provided callable <function mean at 0x7fbbccdfea60> is currently using SeriesGroupBy.mean. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "mean" instead.
  mean_diff_overall = df_long.groupby(self.estimate._treatment_name + ["common_cause_id"]).agg(
/home/runner/work/dowhy/dowhy/dowhy/interpreters/propensity_balance_interpreter.py:74: FutureWarning: The provided callable <function std at 0x7fbbccdfeb80> is currently using SeriesGroupBy.std. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "std" instead.
  stddev_overall = df_long.groupby(["common_cause_id"]).agg(stddev=("W", np.std)).reset_index()

../_images/example_notebooks_dowhy_interpreter_18_1.png

This plot shows how the SMD decreases from the unadjusted to the stratified units.

Method 2: Propensity Score Matching#

We will be using propensity scores to match units in the data.

[11]:

causal_estimate_match = model.estimate_effect(identified_estimand,
                                              method_name="backdoor.propensity_score_matching",
                                              target_units="atc")
print(causal_estimate_match)
print("Causal Estimate is " + str(causal_estimate_match.value))

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W3,W2,W4,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W3,W2,W4,W0,U) = P(y|v0,W1,W3,W2,W4,W0)

## Realized estimand
b: y~v0+W1+W3+W2+W4+W0
Target units: atc

## Estimate
Mean value: 0.9941094322210723

Causal Estimate is 0.9941094322210723

[12]:

# Textual Interpreter
interpretation = causal_estimate_match.interpret(method_name="textual_effect_interpreter")

Increasing the treatment variable(s) [v0] from 0 to 1 causes an increase of 0.9941094322210723 in the expected value of the outcome [['y']], over the data distribution/population represented by the dataset.

Cannot use propensity balance interpretor here since the interpreter method only supports propensity score stratification estimator.

Method 3: Weighting#

We will be using (inverse) propensity scores to assign weights to units in the data. DoWhy supports a few different weighting schemes:

Vanilla Inverse Propensity Score weighting (IPS) (weighting_scheme=”ips_weight”)
Self-normalized IPS weighting (also known as the Hajek estimator) (weighting_scheme=”ips_normalized_weight”)
Stabilized IPS weighting (weighting_scheme = “ips_stabilized_weight”)

[13]:

causal_estimate_ipw = model.estimate_effect(identified_estimand,
                                            method_name="backdoor.propensity_score_weighting",
                                            target_units = "ate",
                                            method_params={"weighting_scheme":"ips_weight"})
print(causal_estimate_ipw)
print("Causal Estimate is " + str(causal_estimate_ipw.value))

*** Causal Estimate ***

## Identified estimand
Estimand type: EstimandType.NONPARAMETRIC_ATE

### Estimand : 1
Estimand name: backdoor
Estimand expression:
  d
─────(E[y|W1,W3,W2,W4,W0])
d[v₀]
Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W3,W2,W4,W0,U) = P(y|v0,W1,W3,W2,W4,W0)

## Realized estimand
b: y~v0+W1+W3+W2+W4+W0
Target units: ate

## Estimate
Mean value: 1.0163090417491503

Causal Estimate is 1.0163090417491503

[14]:

# Textual Interpreter
interpretation = causal_estimate_ipw.interpret(method_name="textual_effect_interpreter")

Increasing the treatment variable(s) [v0] from 0 to 1 causes an increase of 1.0163090417491503 in the expected value of the outcome [['y']], over the data distribution/population represented by the dataset.

[15]:

interpretation = causal_estimate_ipw.interpret(method_name="confounder_distribution_interpreter", fig_size=(8,8), font_size=12, var_name='W4', var_type='discrete')

/home/runner/work/dowhy/dowhy/dowhy/interpreters/confounder_distribution_interpreter.py:83: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  barplot_df_before = df.groupby([self.var_name, treated]).size().reset_index(name="count")
/home/runner/work/dowhy/dowhy/dowhy/interpreters/confounder_distribution_interpreter.py:86: FutureWarning: The default of observed=False is deprecated and will be changed to True in a future version of pandas. Pass observed=False to retain current behavior or observed=True to adopt the future default and silence this warning.
  barplot_df_after = df.groupby([self.var_name, treated]).agg({"weight": np.sum}).reset_index()
/home/runner/work/dowhy/dowhy/dowhy/interpreters/confounder_distribution_interpreter.py:86: FutureWarning: The provided callable <function sum at 0x7fbbcce7aaf0> is currently using SeriesGroupBy.sum. In a future version of pandas, the provided callable will be used directly. To keep current behavior pass the string "sum" instead.
  barplot_df_after = df.groupby([self.var_name, treated]).agg({"weight": np.sum}).reset_index()

../_images/example_notebooks_dowhy_interpreter_27_1.png

[ ]: