Simple example on using Instrumental Variables method for estimation

[1]:
import numpy as np
import pandas as pd
import patsy as ps

from statsmodels.sandbox.regression.gmm import IV2SLS
import os, sys
sys.path.append(os.path.abspath("../../../"))
from dowhy import CausalModel
[2]:
n_points = 1000
education_abilty = 1
education_voucher = 0.5
income_abilty = 2
income_education = 4


# confounder
ability = np.random.normal(0, 3, size=n_points)

# instrument
voucher = np.random.normal(2, 1, size=n_points)

# treatment
education = np.random.normal(5, 1, size=n_points) + education_abilty * ability +\
            education_voucher * voucher

# outcome
income = np.random.normal(10, 3, size=n_points) +\
         income_abilty * ability + income_education * education

# build dataset
data = np.stack([ability, education, income, voucher]).T
df = pd.DataFrame(data, columns = ['ability', 'education', 'income', 'voucher'])
[3]:
income_vec, endog = ps.dmatrices("income ~ education", data=df)
exog = ps.dmatrix("voucher", data=df)

m = IV2SLS(income_vec, endog, exog).fit()
m.summary()
[3]:
IV2SLS Regression Results
Dep. Variable: income R-squared: 0.899
Model: IV2SLS Adj. R-squared: 0.899
Method: Two Stage F-statistic: 160.6
Least Squares Prob (F-statistic): 3.05e-34
Date: Tue, 07 Jan 2020
Time: 14:32:06
No. Observations: 1000
Df Residuals: 998
Df Model: 1
coef std err t P>|t| [0.025 0.975]
Intercept 8.3670 1.987 4.211 0.000 4.468 12.266
education 4.2607 0.336 12.674 0.000 3.601 4.920
Omnibus: 0.871 Durbin-Watson: 2.058
Prob(Omnibus): 0.647 Jarque-Bera (JB): 0.953
Skew: 0.059 Prob(JB): 0.621
Kurtosis: 2.904 Cond. No. 14.3
[4]:
model=CausalModel(
        data = df,
        treatment='education',
        outcome='income',
        common_causes=['ability'],
        instruments=['voucher']
        )

identified_estimand = model.identify_effect()

estimate = model.estimate_effect(identified_estimand,
        method_name="iv.instrumental_variable", test_significance=True
)
print(estimate)

WARNING:dowhy.causal_model:Causal Graph not provided. DoWhy will construct a graph based on data inputs.
INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named "Unobserved Confounders" to reflect this.
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['education'] on outcome ['income']
INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['U', 'ability']
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y
INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['voucher']
INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator
INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator
Realized estimand type: nonparametric-ate
Estimand expression:

Expectation(Derivative(income, voucher))⋅Expectation(Derivative(education, vou

      -1
cher))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['education'] is affected in the same way by common causes of ['education'] and income
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome income is affected in the same way by common causes of ['education'] and income

*** Causal Estimate ***

## Target estimand
Estimand type: nonparametric-ate
### Estimand : 1
Estimand name: backdoor
Estimand expression:
     d
────────────(Expectation(income|ability))
d[education]
Estimand assumption 1, Unconfoundedness: If U→{education} and U→income then P(income|education,ability,U) = P(income|education,ability)
### Estimand : 2
Estimand name: iv
Estimand expression:
Expectation(Derivative(income, [voucher])*Derivative([education], [voucher])**
(-1))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)

## Realized estimand
Realized estimand: Wald Estimator
Realized estimand type: nonparametric-ate
Estimand expression:

Expectation(Derivative(income, voucher))⋅Expectation(Derivative(education, vou

      -1
cher))
Estimand assumption 1, As-if-random: If U→→income then ¬(U →→{voucher})
Estimand assumption 2, Exclusion: If we remove {voucher}→{education}, then ¬({voucher}→income)
Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['education'] is affected in the same way by common causes of ['education'] and income
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome income is affected in the same way by common causes of ['education'] and income

## Estimate
Value: 4.2606685045720365

## Statistical Significance
p-value: <0.001