DoWhy example on ihdp (Infant Health and Development Program) dataset

[1]:

# importing required libraries
import os, sys
sys.path.append(os.path.abspath("../../../"))
import dowhy
from dowhy import CausalModel
import pandas as pd
import numpy as np

Loading Data

[2]:

data= pd.read_csv("https://raw.githubusercontent.com/AMLab-Amsterdam/CEVAE/master/datasets/IHDP/csv/ihdp_npci_1.csv", header = None)
col =  ["treatment", "y_factual", "y_cfactual", "mu0", "mu1" ,]
for i in range(1,26):
    col.append("x"+str(i))
data.columns = col
data = data.astype({"treatment":'bool'}, copy=False)
data.head()

[2]:

	treatment	y_factual	y_cfactual	mu0	mu1	x1	x2	x3	x4	x5	...	x16	x17	x18	x19
0	True	5.599916	4.318780	3.268256	6.854457	-0.528603	-0.343455	1.128554	0.161703	-0.316603	...	1	1	1	1
1	False	6.875856	7.856495	6.636059	7.562718	-1.736945	-1.802002	0.383828	2.244320	-0.629189	...	1	1	1	1
2	False	2.996273	6.633952	1.570536	6.121617	-0.807451	-0.202946	-0.360898	-0.879606	0.808706	...	1	0	1	1
3	False	1.366206	5.697239	1.244738	5.889125	0.390083	0.596582	-1.850350	-0.879606	-0.004017	...	1	0	1	1
4	False	1.963538	6.202582	1.685048	6.191994	-1.045229	-0.602710	0.011465	0.161703	0.683672	...	1	1	1	1

5 rows × 30 columns

1.Model

[3]:

# Create a causal model from the data and given common causes.
xs = ""
for i in range(1,26):
    xs += ("x"+str(i)+"+")

model=CausalModel(
        data = data,
        treatment='treatment',
        outcome='y_factual',
        common_causes=xs.split('+')
        )

WARNING:dowhy.causal_model:Causal Graph not provided. DoWhy will construct a graph based on data inputs.
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['treatment'] on outcome ['y_factual']

2.Identify

[4]:

#Identify the causal effect
identified_estimand = model.identify_effect()

INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['', 'x8', 'x13', 'x21', 'x3', 'x14', 'x10', 'x6', 'x1', 'x24', 'x18', 'x15', 'x7', 'x12', 'x9', 'x22', 'x2', 'x17', 'x19', 'x11', 'x16', 'x4', 'x20', 'x25', 'x23', 'x5']
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.

WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y

INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:[]

3. Estimate (using different methods)

3.1 Using Linear Regression

[5]:

# Estimate the causal effect and compare it with Average Treatment Effect
estimate = model.estimate_effect(identified_estimand,
        method_name="backdoor.linear_regression", test_significance=True
)

print(estimate)

print("Causal Estimate is " + str(estimate.value))
data_1 = data[data["treatment"]==1]
data_0 = data[data["treatment"]==0]

print("ATE", np.mean(data_1["y_factual"])- np.mean(data_0["y_factual"]))

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y_factual~treatment+x8+x13+x21+x3+x14+x10+x6+x1+x24+x18+x15+x7+x12+x9+x22+x2+x17+x19+x11+x16+x4+x20+x25+x23+x5

*** Causal Estimate ***

## Target estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
     d
────────────(Expectation(y_factual|x8,x13,x21,x3,x14,x10,x6,x1,x24,x18,x15,x7,
d[treatment]


x12,x9,x22,x2,x17,x19,x11,x16,x4,x20,x25,x23,x5))

Estimand assumption 1, Unconfoundedness: If U→{treatment} and U→y_factual then P(y_factual|treatment,x8,x13,x21,x3,x14,x10,x6,x1,x24,x18,x15,x7,x12,x9,x22,x2,x17,x19,x11,x16,x4,x20,x25,x23,x5,U) = P(y_factual|treatment,x8,x13,x21,x3,x14,x10,x6,x1,x24,x18,x15,x7,x12,x9,x22,x2,x17,x19,x11,x16,x4,x20,x25,x23,x5)
### Estimand : 2
Estimand name: iv
No such variable found!

## Realized estimand
b: y_factual~treatment+x8+x13+x21+x3+x14+x10+x6+x1+x24+x18+x15+x7+x12+x9+x22+x2+x17+x19+x11+x16+x4+x20+x25+x23+x5
## Estimate
Value: 3.92867175087271

## Statistical Significance
p-value: <0.001

Causal Estimate is 3.92867175087271
ATE 4.021121012430829

3.2 Using Propensity Score Matching

[6]:

estimate = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_matching"
)

print("Causal Estimate is " + str(estimate.value))

print("ATE", np.mean(data_1["y_factual"])- np.mean(data_0["y_factual"]))

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Matching Estimator
INFO:dowhy.causal_estimator:b: y_factual~treatment+x8+x13+x21+x3+x14+x10+x6+x1+x24+x18+x15+x7+x12+x9+x22+x2+x17+x19+x11+x16+x4+x20+x25+x23+x5
/home/amshar/python-environments/vpy36/lib/python3.6/site-packages/sklearn/utils/validation.py:744: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)
/mnt/c/Users/amshar/code/dowhy/dowhy/causal_estimators/propensity_score_matching_estimator.py:62: FutureWarning: `item` has been deprecated and will be removed in a future version
  control_outcome = control.iloc[indices[i]][self._outcome_name].item()
/mnt/c/Users/amshar/code/dowhy/dowhy/causal_estimators/propensity_score_matching_estimator.py:77: FutureWarning: `item` has been deprecated and will be removed in a future version
  treated_outcome = treated.iloc[indices[i]][self._outcome_name].item()

Causal Estimate is 3.9791388232170393
ATE 4.021121012430829

3.3 Using Propensity Score Stratification

[7]:

estimate = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_stratification", method_params={'num_strata':50, 'clipping_threshold':5}
)

print("Causal Estimate is " + str(estimate.value))
print("ATE", np.mean(data_1["y_factual"])- np.mean(data_0["y_factual"]))

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Stratification Estimator
INFO:dowhy.causal_estimator:b: y_factual~treatment+x8+x13+x21+x3+x14+x10+x6+x1+x24+x18+x15+x7+x12+x9+x22+x2+x17+x19+x11+x16+x4+x20+x25+x23+x5
/home/amshar/python-environments/vpy36/lib/python3.6/site-packages/sklearn/utils/validation.py:744: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)

Causal Estimate is 3.4550471588628207
ATE 4.021121012430829

3.4 Using Propensity Score Weighting

[8]:

estimate = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_weighting"
)

print("Causal Estimate is " + str(estimate.value))

print("ATE", np.mean(data_1["y_factual"])- np.mean(data_0["y_factual"]))

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator
INFO:dowhy.causal_estimator:b: y_factual~treatment+x8+x13+x21+x3+x14+x10+x6+x1+x24+x18+x15+x7+x12+x9+x22+x2+x17+x19+x11+x16+x4+x20+x25+x23+x5

Causal Estimate is 3.409737824406429
ATE 4.021121012430829

/home/amshar/python-environments/vpy36/lib/python3.6/site-packages/sklearn/utils/validation.py:744: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)

4. Refute

[9]:

refute_results=model.refute_estimate(identified_estimand, estimate,
        method_name="random_common_cause")
print(refute_results)

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator
INFO:dowhy.causal_estimator:b: y_factual~treatment+x8+x13+x21+x3+x14+x10+x6+x1+x24+x18+x15+x7+x12+x9+x22+x2+x17+x19+x11+x16+x4+x20+x25+x23+x5+w_random
/home/amshar/python-environments/vpy36/lib/python3.6/site-packages/sklearn/utils/validation.py:744: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)

Refute: Add a Random Common Cause
Estimated effect:(3.409737824406429,)
New effect:(3.4008436132771305,)

[10]:

res_placebo=model.refute_estimate(identified_estimand, estimate,
        method_name="placebo_treatment_refuter", placebo_type="permute")
print(res_placebo)

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator
INFO:dowhy.causal_estimator:b: y_factual~placebo+x8+x13+x21+x3+x14+x10+x6+x1+x24+x18+x15+x7+x12+x9+x22+x2+x17+x19+x11+x16+x4+x20+x25+x23+x5
/home/amshar/python-environments/vpy36/lib/python3.6/site-packages/sklearn/utils/validation.py:744: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)

Refute: Use a Placebo Treatment
Estimated effect:(3.409737824406429,)
New effect:(-0.08870810484238234,)

4.3 Data Subset Refuter

[11]:

res_subset=model.refute_estimate(identified_estimand, estimate,
        method_name="data_subset_refuter", subset_fraction=0.9)
print(res_subset)

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator
INFO:dowhy.causal_estimator:b: y_factual~treatment+x8+x13+x21+x3+x14+x10+x6+x1+x24+x18+x15+x7+x12+x9+x22+x2+x17+x19+x11+x16+x4+x20+x25+x23+x5
/home/amshar/python-environments/vpy36/lib/python3.6/site-packages/sklearn/utils/validation.py:744: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().
  y = column_or_1d(y, warn=True)

Refute: Use a subset of data
Estimated effect:(3.409737824406429,)
New effect:(3.4424088676372993,)