# DoWhy example on the Lalonde dataset

Thanks to [@mizuy](https://github.com/mizuy) for providing this example. Here we use the Lalonde dataset and apply IPW estimator to it.

[1]:

import os, sys
sys.path.append(os.path.abspath("../../../"))

import dowhy
from dowhy import CausalModel
from rpy2.robjects import r as R

#%R install.packages("Matching")
%R library(Matching)

R[write to console]: Loading required package: MASS

R[write to console]: ##
##  Matching (Version 4.10-2, Build Date: 2022-04-13)
##  See http://sekhon.berkeley.edu/matching for additional documentation.
##   Jasjeet S. Sekhon. 2011. Multivariate and Propensity Score Matching
##   Software with Automated Balance Optimization: The Matching package for R.''
##   Journal of Statistical Software, 42(7): 1-52.
##


[1]:

StrVector with 10 elements.
 'Matching' 'MASS' 'tools' ... 'datasets' 'methods' 'base'

[2]:

%R data(lalonde)
%R -o lalonde
lalonde = lalonde.astype({'treat':'bool'}, copy=False)


## Run DoWhy analysis: model, identify, estimate

[3]:

model=CausalModel(
data = lalonde,
treatment='treat',
outcome='re78',
common_causes='nodegr+black+hisp+age+educ+married'.split('+'))
identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)
estimate = model.estimate_effect(identified_estimand,
method_name="backdoor.propensity_score_weighting",
target_units="ate",
method_params={"weighting_scheme":"ips_weight"})
#print(estimate)
print("Causal Estimate is " + str(estimate.value))

import statsmodels.formula.api as smf
reg=smf.wls('re78~1+treat', data=lalonde, weights=lalonde.ips_stabilized_weight)
res=reg.fit()
res.summary()

Causal Estimate is 1639.806207408078

[3]:

Dep. Variable: R-squared: re78 0.015 WLS 0.013 Least Squares 6.743 Wed, 14 Sep 2022 0.00973 18:56:19 -4544.7 445 9093. 443 9102. 1 nonrobust
coef std err t P>|t| [0.025 0.975] 4555.0761 406.704 11.200 0.000 3755.767 5354.385 1639.8062 631.495 2.597 0.010 398.707 2880.905
 Omnibus: Durbin-Watson: 303.262 2.085 0 4770.57 2.709 0 18.097 2.47

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

## Interpret the estimate

The plot below shows how the distribution of a confounder, “married” changes from the original data to the weighted data. In both datasets, we compare the distribution of “married” across treated and untreated units.

[4]:

estimate.interpret(method_name="confounder_distribution_interpreter",var_type='discrete',
var_name='married', fig_size = (10, 7), font_size = 12)


## Sanity check: compare to manual IPW estimate

[5]:

df = model._data
ps = df['propensity_score']
y = df['re78']
z = df['treat']

ey1 = z*y/ps / sum(z/ps)
ey0 = (1-z)*y/(1-ps) / sum((1-z)/(1-ps))
ate = ey1.sum()-ey0.sum()
print("Causal Estimate is " + str(ate))

# correct -> Causal Estimate is 1634.9868359746906

Causal Estimate is 1639.80620740808