Demo for the DoWhy causal API#

We show a simple example of adding a causal extension to any dataframe.

[1]:

import dowhy.datasets
import dowhy.api
from dowhy.graph import build_graph_from_str

import numpy as np
import pandas as pd

from statsmodels.api import OLS

[2]:

data = dowhy.datasets.linear_dataset(beta=5,
        num_common_causes=1,
        num_instruments = 0,
        num_samples=1000,
        treatment_is_binary=True)
df = data['df']
df['y'] = df['y'] + np.random.normal(size=len(df)) # Adding noise to data. Without noise, the variance in Y|X, Z is zero, and mcmc fails.
nx_graph = build_graph_from_str(data["dot_graph"])

treatment= data["treatment_name"][0]
outcome = data["outcome_name"][0]
common_cause = data["common_causes_names"][0]
df

[2]:

	W0	v0	y
0	1.524459	False	2.894592
1	0.597564	True	5.522022
2	-0.918363	False	-1.742012
3	-0.144220	False	-0.553316
4	0.277026	True	6.652711
...	...	...	...
995	0.275157	False	-0.155439
996	-0.490586	True	2.751520
997	0.386001	False	1.374444
998	0.623935	True	6.125341
999	-0.750355	False	-1.535576

1000 rows × 3 columns

[3]:

# data['df'] is just a regular pandas.DataFrame
df.causal.do(x=treatment,
             variable_types={treatment: 'b', outcome: 'c', common_cause: 'c'},
             outcome=outcome,
             common_causes=[common_cause],
            ).groupby(treatment).mean().plot(y=outcome, kind='bar')

[3]:

<Axes: xlabel='v0'>

../_images/example_notebooks_dowhy_causal_api_3_1.png

[4]:

df.causal.do(x={treatment: 1},
              variable_types={treatment:'b', outcome: 'c', common_cause: 'c'},
              outcome=outcome,
              method='weighting',
              common_causes=[common_cause]
              ).groupby(treatment).mean().plot(y=outcome, kind='bar')

[4]:

<Axes: xlabel='v0'>

../_images/example_notebooks_dowhy_causal_api_4_1.png

[5]:

cdf_1 = df.causal.do(x={treatment: 1},
              variable_types={treatment: 'b', outcome: 'c', common_cause: 'c'},
              outcome=outcome,
              graph=nx_graph
              )

cdf_0 = df.causal.do(x={treatment: 0},
              variable_types={treatment: 'b', outcome: 'c', common_cause: 'c'},
              outcome=outcome,
              graph=nx_graph
              )

[6]:

cdf_0

[6]:

	W0	v0	y	propensity_score	weight
0	0.686850	False	1.336979	0.307573	3.251263
1	-0.968684	False	-2.292953	0.719606	1.389649
2	0.208372	False	1.270343	0.424444	2.356022
3	-0.039039	False	0.876932	0.489394	2.043342
4	1.195703	False	1.539124	0.205771	4.859778
...	...	...	...	...	...
995	1.606665	False	2.004791	0.143562	6.965625
996	1.336141	False	2.721816	0.182515	5.479001
997	-0.143527	False	-0.421825	0.517062	1.934005
998	-0.399153	False	-1.246800	0.583973	1.712409
999	-1.832077	False	-2.103029	0.864976	1.156101

1000 rows × 5 columns

[7]:

cdf_1

[7]:

	W0	v0	y	propensity_score	weight
0	0.486018	True	5.731617	0.645362	1.549517
1	-2.019900	True	1.438117	0.113423	8.816575
2	0.779625	True	5.351522	0.712956	1.402611
3	1.599964	True	9.187401	0.855563	1.168821
4	-1.085253	True	3.149118	0.256163	3.903769
...	...	...	...	...	...
995	1.028011	True	6.188327	0.763677	1.309455
996	-0.070812	True	6.911912	0.502191	1.991272
997	-1.238410	True	1.942119	0.226483	4.415339
998	0.120633	True	6.926872	0.552703	1.809290
999	-1.844860	True	2.818377	0.133450	7.493462

1000 rows × 5 columns

Comparing the estimate to Linear Regression#

First, estimating the effect using the causal data frame, and the 95% confidence interval.

[8]:

(cdf_1['y'] - cdf_0['y']).mean()

[8]:

$\displaystyle 4.81384155047008$

[9]:

1.96*(cdf_1['y'] - cdf_0['y']).std() / np.sqrt(len(df))

[9]:

$\displaystyle 0.160219938187493$

Comparing to the estimate from OLS.

[10]:

model = OLS(np.asarray(df[outcome]), np.asarray(df[[common_cause, treatment]], dtype=np.float64))
result = model.fit()
result.summary()

[10]:

OLS Regression Results
Dep. Variable:	y	R-squared (uncentered):	0.955
Model:	OLS	Adj. R-squared (uncentered):	0.954
Method:	Least Squares	F-statistic:	1.047e+04
Date:	Tue, 26 Aug 2025	Prob (F-statistic):	0.00
Time:	20:27:55	Log-Likelihood:	-1375.7
No. Observations:	1000	AIC:	2755.
Df Residuals:	998	BIC:	2765.
Df Model:	2
Covariance Type:	nonrobust

	coef	std err	t	P>\|t\|	[0.025	0.975]
x1	1.5378	0.031	48.913	0.000	1.476	1.600
x2	5.0199	0.044	113.339	0.000	4.933	5.107

Omnibus:	3.734	Durbin-Watson:	1.931
Prob(Omnibus):	0.155	Jarque-Bera (JB):	3.814
Skew:	-0.092	Prob(JB):	0.149
Kurtosis:	3.240	Cond. No.	1.62

Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Demo for the DoWhy causal API#

Comparing the estimate to Linear Regression#

This Page