DoWhy: Different estimation methods for causal inference

This is a quick introduction to the DoWhy causal inference library. We will load in a sample dataset and use different methods for estimating the causal effect of a (pre-specified)treatment variable on a (pre-specified) outcome variable.

First, let us add the required path for Python to find the DoWhy code and load all required packages

[1]:

import os, sys
sys.path.append(os.path.abspath("../../"))

[2]:

import numpy as np
import pandas as pd
import logging

import dowhy
from dowhy.do_why import CausalModel
import dowhy.datasets

Now, let us load a dataset. For simplicity, we simulate a dataset with linear relationships between common causes and treatment, and common causes and outcome.

Beta is the true causal effect.

[3]:

data = dowhy.datasets.linear_dataset(beta=10,
        num_common_causes=5,
        num_instruments = 2,
        num_samples=10000,
        treatment_is_binary=True)
df = data["df"]

Note that we are using a pandas dataframe to load the data.

Identifying the causal estimand

We now input a causal graph in the DOT graph format.

[4]:

# With graph
model=CausalModel(
        data = df,
        treatment=data["treatment_name"],
        outcome=data["outcome_name"],
        graph=data["gml_graph"],
        instruments=data["instrument_names"],
        logging_level = logging.INFO
        )

INFO:dowhy.do_why:Model to find the causal effect of treatment ['v'] on outcome ['y']

[5]:

model.view_model()

[6]:

from IPython.display import Image, display
display(Image(filename="causal_model.png"))

_images/dowhy_estimation_methods_10_0.png

We get a causal graph. Now identification and estimation is done.

[7]:

identified_estimand = model.identify_effect()
print(identified_estimand)

INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['X2', 'Z0', 'X3', 'X4', 'X1', 'X0', 'Z1', 'Unobserved Confounders']
WARNING:dowhy.causal_identifier:There are unobserved common causes. Causal effect cannot be identified.

WARN: Do you want to continue by ignoring these unobserved confounders? [y/n] y

INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['Z1', 'Z0']

Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)

Method 1: Regression

Use linear regression.

[8]:

causal_estimate_reg = model.estimate_effect(identified_estimand,
        method_name="backdoor.linear_regression",
        test_significance=True)
print(causal_estimate_reg)
print("Causal Estimate is " + str(causal_estimate_reg.value))

INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~v+X2+Z0+X3+X4+X1+X0+Z1

*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)

## Realized estimand
b: y~v+X2+Z0+X3+X4+X1+X0+Z1
## Estimate
Value: 9.999999999999995

## Statistical Significance
p-value: <0.001

Causal Estimate is 10.0

Method 2: Stratification

We will be using propensity scores to stratify units in the data.

[9]:

causal_estimate_strat = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_stratification")
print(causal_estimate_strat)
print("Causal Estimate is " + str(causal_estimate_strat.value))

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Stratification Estimator
INFO:dowhy.causal_estimator:b: y~v+X2+Z0+X3+X4+X1+X0+Z1

*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)

## Realized estimand
b: y~v+X2+Z0+X3+X4+X1+X0+Z1
## Estimate
Value: 10.061716788484345

Causal Estimate is 10.0617167885

Method 3: Matching

We will be using propensity scores to match units in the data.

[10]:

causal_estimate_match = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_matching")
print(causal_estimate_match)
print("Causal Estimate is " + str(causal_estimate_match.value))

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Matching Estimator
INFO:dowhy.causal_estimator:b: y~v+X2+Z0+X3+X4+X1+X0+Z1

*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)

## Realized estimand
b: y~v+X2+Z0+X3+X4+X1+X0+Z1
## Estimate
Value: 7.391127134286239

Causal Estimate is 7.391127134286239

Method 4: Weighting

We will be using (inverse) propensity scores to assign weights to units in the data.

[11]:

causal_estimate_ipw = model.estimate_effect(identified_estimand,
        method_name="backdoor.propensity_score_weighting")
print(causal_estimate_ipw)
print("Causal Estimate is " + str(causal_estimate_ipw.value))

INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator
INFO:dowhy.causal_estimator:b: y~v+X2+Z0+X3+X4+X1+X0+Z1

*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)

## Realized estimand
b: y~v+X2+Z0+X3+X4+X1+X0+Z1
## Estimate
Value: 17.631870964903452

Causal Estimate is 17.6318709649

Method 5: Instrumental Variable

We will be using the Wald estimator for the provided instrumental variable.

[12]:

causal_estimate_iv = model.estimate_effect(identified_estimand,
        method_name="iv.instrumental_variable", method_params={'iv_instrument_name':'Z1'})
print(causal_estimate_iv)
print("Causal Estimate is " + str(causal_estimate_iv.value))

INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator
INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator
Realized estimand type: ate
Estimand expression:
                                                             -1
Expectation(Derivative(y, Z1))⋅Expectation(Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, treatment_effect_homogeneity: Each unit's treatment v isaffected in the same way by common causes of v and y
Estimand assumption 3, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome y isaffected in the same way by common causes of v and y

*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)

## Realized estimand
Realized estimand: Wald Estimator
Realized estimand type: ate
Estimand expression:
                                                             -1
Expectation(Derivative(y, Z1))⋅Expectation(Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, treatment_effect_homogeneity: Each unit's treatment v isaffected in the same way by common causes of v and y
Estimand assumption 3, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome y isaffected in the same way by common causes of v and y

## Estimate
Value: 12.992472396947559

Causal Estimate is 12.9924723969

Method 6: Regression Discontinuity

We will be internally converting this to an equivalent instrumental variables problem.

[13]:

causal_estimate_regdist = model.estimate_effect(identified_estimand,
        method_name="iv.regression_discontinuity",
        method_params={'rd_variable_name':'Z1',
                       'rd_threshold_value':0.5,
                       'rd_bandwidth': 0.1})
print(causal_estimate_regdist)
print("Causal Estimate is " + str(causal_estimate_regdist.value))

INFO:dowhy.causal_estimator:Using Regression Discontinuity Estimator
INFO:dowhy.causal_estimator:
INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator
INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator
Realized estimand type: ate
Estimand expression:
                                                             -1
Expectation(Derivative(y, Z1))⋅Expectation(Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, treatment_effect_homogeneity: Each unit's treatment local_treatment isaffected in the same way by common causes of local_treatment and local_outcome
Estimand assumption 3, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome local_outcome isaffected in the same way by common causes of local_treatment and local_outcome

*** Causal Estimate ***

## Target estimand
Estimand type: ate
### Estimand : 1
Estimand name: iv
Estimand expression:
Expectation(Derivative(y, Z1)/Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
### Estimand : 2
Estimand name: backdoor
Estimand expression:
d
──(Expectation(y|X2,Z0,X3,X4,X1,X0,Z1))
dv
Estimand assumption 1, Unconfoundedness: If U→v and U→y then P(y|v,X2,Z0,X3,X4,X1,X0,Z1,U) = P(y|v,X2,Z0,X3,X4,X1,X0,Z1)

## Realized estimand
Realized estimand: Wald Estimator
Realized estimand type: ate
Estimand expression:
                                                             -1
Expectation(Derivative(y, Z1))⋅Expectation(Derivative(v, Z1))
Estimand assumption 1, As-if-random: If U→→y then ¬(U →→Z1,Z0)
Estimand assumption 2, treatment_effect_homogeneity: Each unit's treatment local_treatment isaffected in the same way by common causes of local_treatment and local_outcome
Estimand assumption 3, Exclusion: If we remove {Z1,Z0}→v, then ¬(Z1,Z0→y)
Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome local_outcome isaffected in the same way by common causes of local_treatment and local_outcome

## Estimate
Value: 12.84020418877542

Causal Estimate is 12.8402041888