Demo for the DoWhy causal API#

We show a simple example of adding a causal extension to any dataframe.

[1]:

import dowhy.datasets
import dowhy.api
from dowhy.graph import build_graph_from_str

import numpy as np
import pandas as pd

from statsmodels.api import OLS

[2]:

data = dowhy.datasets.linear_dataset(beta=5,
        num_common_causes=1,
        num_instruments = 0,
        num_samples=1000,
        treatment_is_binary=True)
df = data['df']
df['y'] = df['y'] + np.random.normal(size=len(df)) # Adding noise to data. Without noise, the variance in Y|X, Z is zero, and mcmc fails.
nx_graph = build_graph_from_str(data["dot_graph"])

treatment= data["treatment_name"][0]
outcome = data["outcome_name"][0]
common_cause = data["common_causes_names"][0]
df

[2]:

	W0	v0	y
0	1.734861	True	5.677652
1	1.692967	True	5.345768
2	-0.678012	True	5.865227
3	0.320417	False	-1.465444
4	-1.743889	True	5.025204
...	...	...	...
995	-1.195709	False	-0.097315
996	1.086976	False	1.035984
997	0.115227	True	4.093856
998	0.986937	True	4.631330
999	0.226167	True	5.485148

1000 rows × 3 columns

[3]:

# data['df'] is just a regular pandas.DataFrame
df.causal.do(x=treatment,
             variable_types={treatment: 'b', outcome: 'c', common_cause: 'c'},
             outcome=outcome,
             common_causes=[common_cause],
            ).groupby(treatment).mean().plot(y=outcome, kind='bar')

[3]:

<Axes: xlabel='v0'>

../_images/example_notebooks_dowhy_causal_api_3_1.png

[4]:

df.causal.do(x={treatment: 1},
              variable_types={treatment:'b', outcome: 'c', common_cause: 'c'},
              outcome=outcome,
              method='weighting',
              common_causes=[common_cause]
              ).groupby(treatment).mean().plot(y=outcome, kind='bar')

[4]:

<Axes: xlabel='v0'>

../_images/example_notebooks_dowhy_causal_api_4_1.png

[5]:

cdf_1 = df.causal.do(x={treatment: 1},
              variable_types={treatment: 'b', outcome: 'c', common_cause: 'c'},
              outcome=outcome,
              graph=nx_graph
              )

cdf_0 = df.causal.do(x={treatment: 0},
              variable_types={treatment: 'b', outcome: 'c', common_cause: 'c'},
              outcome=outcome,
              graph=nx_graph
              )

[6]:

cdf_0

[6]:

	W0	v0	y	propensity_score	weight
0	1.425122	False	1.553599	0.468980	2.132286
1	1.522968	False	2.940058	0.464910	2.150952
2	1.624560	False	-0.714125	0.460690	2.170659
3	1.883998	False	-1.628021	0.449938	2.222528
4	0.781901	False	-1.317539	0.495812	2.016892
...	...	...	...	...	...
995	1.883998	False	-1.628021	0.449938	2.222528
996	1.624560	False	-0.714125	0.460690	2.170659
997	0.027170	False	1.790618	0.527316	1.896398
998	0.538403	False	-1.014590	0.505985	1.976345
999	0.007696	False	1.750373	0.528127	1.893485

1000 rows × 5 columns

[7]:

cdf_1

[7]:

	W0	v0	y	propensity_score	weight
0	1.628855	True	5.879210	0.539489	1.853607
1	0.800083	True	5.501920	0.504947	1.980405
2	1.329587	True	6.044962	0.527042	1.897382
3	2.467841	True	2.798915	0.574072	1.741940
4	0.387281	True	3.828899	0.487704	2.050424
...	...	...	...	...	...
995	2.490357	True	6.209058	0.574992	1.739154
996	0.802514	True	5.794019	0.505049	1.980007
997	0.835262	True	4.489131	0.506417	1.974659
998	0.080196	True	4.995161	0.474894	2.105735
999	1.739403	True	3.959105	0.544075	1.837982

1000 rows × 5 columns

Comparing the estimate to Linear Regression#

First, estimating the effect using the causal data frame, and the 95% confidence interval.

[8]:

(cdf_1['y'] - cdf_0['y']).mean()

[8]:

$\displaystyle 5.01130335938293$

[9]:

1.96*(cdf_1['y'] - cdf_0['y']).std() / np.sqrt(len(df))

[9]:

$\displaystyle 0.0886036474860154$

Comparing to the estimate from OLS.

[10]:

model = OLS(np.asarray(df[outcome]), np.asarray(df[[common_cause, treatment]], dtype=np.float64))
result = model.fit()
result.summary()

[10]:

OLS Regression Results
Dep. Variable:	y	R-squared (uncentered):	0.934
Model:	OLS	Adj. R-squared (uncentered):	0.934
Method:	Least Squares	F-statistic:	7086.
Date:	Sat, 12 Jul 2025	Prob (F-statistic):	0.00
Time:	13:59:10	Log-Likelihood:	-1425.1
No. Observations:	1000	AIC:	2854.
Df Residuals:	998	BIC:	2864.
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
x1	0.2809	0.028	9.906	0.000	0.225	0.337
x2	5.0450	0.052	97.125	0.000	4.943	5.147

Omnibus:	1.607	Durbin-Watson:	2.017
Prob(Omnibus):	0.448	Jarque-Bera (JB):	1.600
Skew:	0.047	Prob(JB):	0.449
Kurtosis:	2.828	Cond. No.	2.33

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Demo for the DoWhy causal API#

Comparing the estimate to Linear Regression#

This Page