Demo for the DoWhy causal API#

We show a simple example of adding a causal extension to any dataframe.

[1]:
import dowhy
dowhy.enable_notebook_rendering()

import dowhy.datasets
import dowhy.api
from dowhy.graph import build_graph_from_str

import numpy as np
import pandas as pd

from statsmodels.api import OLS
[2]:
data = dowhy.datasets.linear_dataset(beta=5,
        num_common_causes=1,
        num_instruments = 0,
        num_samples=1000,
        treatment_is_binary=True)
df = data['df']
df['y'] = df['y'] + np.random.normal(size=len(df)) # Adding noise to data. Without noise, the variance in Y|X, Z is zero, and mcmc fails.
nx_graph = build_graph_from_str(data["dot_graph"])

treatment= data["treatment_name"][0]
outcome = data["outcome_name"][0]
common_cause = data["common_causes_names"][0]
df
[2]:
W0 v0 y
0 -0.100021 False 0.132437
1 0.534403 True 4.776436
2 0.111168 False -0.824853
3 1.370331 True 8.595711
4 -0.618126 False -0.541416
... ... ... ...
995 -0.379531 True 2.793668
996 -1.785375 True 2.363306
997 -2.354834 True 1.268439
998 -0.957736 True 3.688295
999 -1.240677 False -0.713537

1000 rows × 3 columns

[3]:
# data['df'] is just a regular pandas.DataFrame
df.causal.do(x=treatment,
             variable_types={treatment: 'b', outcome: 'c', common_cause: 'c'},
             outcome=outcome,
             common_causes=[common_cause],
            ).groupby(treatment).mean().plot(y=outcome, kind='bar')
[3]:
<Axes: xlabel='v0'>
../_images/example_notebooks_dowhy_causal_api_3_1.png
[4]:
df.causal.do(x={treatment: 1},
              variable_types={treatment:'b', outcome: 'c', common_cause: 'c'},
              outcome=outcome,
              method='weighting',
              common_causes=[common_cause]
              ).groupby(treatment).mean().plot(y=outcome, kind='bar')
[4]:
<Axes: xlabel='v0'>
../_images/example_notebooks_dowhy_causal_api_4_1.png
[5]:
cdf_1 = df.causal.do(x={treatment: 1},
              variable_types={treatment: 'b', outcome: 'c', common_cause: 'c'},
              outcome=outcome,
              graph=nx_graph
              )

cdf_0 = df.causal.do(x={treatment: 0},
              variable_types={treatment: 'b', outcome: 'c', common_cause: 'c'},
              outcome=outcome,
              graph=nx_graph
              )

[6]:
cdf_0
[6]:
W0 v0 y propensity_score weight
0 -0.279066 False 0.311358 0.504760 1.981141
1 -0.355988 False -0.875442 0.507923 1.968801
2 1.127934 False 2.253372 0.447079 2.236739
3 0.929628 False 1.149448 0.455159 2.197035
4 1.048707 False 2.814462 0.450304 2.220721
... ... ... ... ... ...
995 1.967333 False 3.934154 0.413237 2.419917
996 -0.929117 False -2.203187 0.531459 1.881612
997 -1.862456 False -2.065187 0.569443 1.756101
998 -0.014084 False 0.942722 0.493860 2.024867
999 0.373045 False 0.639678 0.477948 2.092276

1000 rows × 5 columns

[7]:
cdf_1
[7]:
W0 v0 y propensity_score weight
0 -1.016380 True 2.974043 0.464967 2.150690
1 -1.750903 True 3.555488 0.435063 2.298520
2 -0.605960 True 2.955557 0.481801 2.075546
3 -0.125322 True 4.851993 0.501565 1.993761
4 -1.366810 True 2.508822 0.450655 2.218991
... ... ... ... ... ...
995 0.318000 True 8.975365 0.519791 1.923849
996 1.241249 True 5.593692 0.557525 1.793641
997 -1.381995 True 3.793374 0.450037 2.222041
998 0.801727 True 7.344791 0.539617 1.853165
999 -1.030269 True 3.333833 0.464399 2.153323

1000 rows × 5 columns

Comparing the estimate to Linear Regression#

First, estimating the effect using the causal data frame, and the 95% confidence interval.

[8]:
(cdf_1['y'] - cdf_0['y']).mean()
[8]:
$\displaystyle 5.06699896618239$
[9]:
1.96*(cdf_1['y'] - cdf_0['y']).std() / np.sqrt(len(df))
[9]:
$\displaystyle 0.163468403376322$

Comparing to the estimate from OLS.

[10]:
model = OLS(np.asarray(df[outcome]), np.asarray(df[[common_cause, treatment]], dtype=np.float64))
result = model.fit()
result.summary()
[10]:
OLS Regression Results
Dep. Variable: y R-squared (uncentered): 0.935
Model: OLS Adj. R-squared (uncentered): 0.935
Method: Least Squares F-statistic: 7149.
Date: Wed, 22 Apr 2026 Prob (F-statistic): 0.00
Time: 08:22:17 Log-Likelihood: -1405.6
No. Observations: 1000 AIC: 2815.
Df Residuals: 998 BIC: 2825.
Df Model: 2
Covariance Type: nonrobust
coef std err t P>|t| [0.025 0.975]
x1 1.6382 0.030 54.469 0.000 1.579 1.697
x2 5.0559 0.045 112.840 0.000 4.968 5.144
Omnibus: 0.105 Durbin-Watson: 1.958
Prob(Omnibus): 0.949 Jarque-Bera (JB): 0.176
Skew: -0.005 Prob(JB): 0.916
Kurtosis: 2.936 Cond. No. 1.53


Notes:
[1] R² is computed without centering (uncentered) since the model does not contain a constant.
[2] Standard Errors assume that the covariance matrix of the errors is correctly specified.