Mediation analysis with DoWhy: Direct and Indirect Effects

[1]:
import numpy as np
import pandas as pd

from dowhy import CausalModel
import dowhy.datasets

# Warnings and logging
import warnings
warnings.filterwarnings('ignore')
import logging
logging.getLogger("dowhy").setLevel(logging.INFO)

Creating a dataset

[2]:
# Creating a dataset with a single confounder and a single mediator (num_frontdoor_variables)
data = dowhy.datasets.linear_dataset(10, num_common_causes=1, num_samples=10000,
                                     num_instruments=0, num_effect_modifiers=0,
                                     num_treatments=1,
                                     num_frontdoor_variables=1,
                                     treatment_is_binary=False,
                                    outcome_is_binary=False)
df = data['df']
print(df.head())
        FD0        W0        v0          y
0  1.661677  0.635219  0.720002   7.712011
1  0.859131 -0.922282 -1.202562  -1.613252
2 -3.239856 -0.529720 -0.938528 -11.898849
3 -1.646743 -1.118923 -1.896495  -9.831622
4 -1.763967  0.305422 -1.590754  -3.820278

Step 1: Modeling the causal mechanism

We create a dataset following a causal graph based on the frontdoor criterion. That is, there is no direct effect of the treatment on outcome; all effect is mediated through the frontdoor variable FD0.

[3]:
model = CausalModel(df,
                    data["treatment_name"],data["outcome_name"],
                    data["gml_graph"],
                   missing_nodes_as_confounders=True,
                   logging_level=logging.INFO)

model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['v0'] on outcome ['y']
../_images/example_notebooks_dowhy_mediation_analysis_5_1.png

Step 2: Identifying the natural direct and indirect effects

We use the estimand_type argument to specify that the target estimand should be for a natural direct effect or the natural indirect effect. For definitions, see Interpretation and Identification of Causal Mediation by Judea Pearl.

Natural direct effect: Effect due to the path v0->y

Natural indirect effect: Effect due to the path v0->FD0->y (mediated by FD0).

[4]:
# Natural direct effect (nde)
identified_estimand_nde = model.identify_effect(estimand_type="nonparametric-nde",
                                            proceed_when_unidentifiable=True)
print(identified_estimand_nde)
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
INFO:dowhy.causal_identifier:Continuing by ignoring these unobserved confounders because proceed_when_unidentifiable flag is True.
INFO:dowhy.causal_identifier:Mediators for treatment and outcome:['FD0']
INFO:dowhy.causal_identifier:All common causes are observed. Causal effect can be identified.
INFO:dowhy.causal_identifier:All common causes are observed. Causal effect can be identified.
Estimand type: nonparametric-nde

### Estimand : 1
Estimand name: mediation
Estimand expression:
Expectation(Derivative(y|FD0, [v0]))
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)

[5]:
# Natural indirect effect (nie)
identified_estimand_nie = model.identify_effect(estimand_type="nonparametric-nie",
                                            proceed_when_unidentifiable=True)
print(identified_estimand_nie)
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
INFO:dowhy.causal_identifier:Continuing by ignoring these unobserved confounders because proceed_when_unidentifiable flag is True.
INFO:dowhy.causal_identifier:Mediators for treatment and outcome:['FD0']
INFO:dowhy.causal_identifier:All common causes are observed. Causal effect can be identified.
INFO:dowhy.causal_identifier:All common causes are observed. Causal effect can be identified.
Estimand type: nonparametric-nie

### Estimand : 1
Estimand name: mediation
Estimand expression:
Expectation(Derivative(y, [FD0])*Derivative([FD0], [v0]))
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)

Step 3: Estimation of the effect

Currently only two stage linear regression is supported for estimation. We plan to add a non-parametric Monte Carlo method soon as described in Imai, Keele and Yamamoto (2010).

Natural Indirect Effect

The estimator converts the mediation effect estimation to a series of backdoor effect estimations. 1. The first-stage model estimates the effect from treatment (v0) to the mediator (FD0). 2. The second-stage model estimates the effect from mediator (FD0) to the outcome (Y).

[6]:
import dowhy.causal_estimators.linear_regression_estimator
causal_estimate_nde = model.estimate_effect(identified_estimand_nie,
                                        method_name="mediation.two_stage_regression",
                                       confidence_intervals=False,
                                       test_significance=False,
                                        method_params = {
                                            'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
                                            'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
                                        }
                                       )
print(causal_estimate_nde)
INFO:dowhy.causal_estimator:INFO: Using Two Stage Regression Estimator
INFO:dowhy.causal_estimator:b: FD0~v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~FD0+v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
*** Causal Estimate ***

## Identified estimand
Estimand type: nonparametric-nie

### Estimand : 1
Estimand name: mediation
Estimand expression:
Expectation(Derivative(y, [FD0])*Derivative([FD0], [v0]))
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)

## Realized estimand
(b: FD0~v0+W0)*(b: y~FD0+v0+W0)
Target units: ate

## Estimate
Mean value: 1.1930272534225996

Note that the value equals the true value of the natural indirect effect (up to random noise).

[7]:
print(causal_estimate_nde.value, data["ate"])
1.1930272534225996 1.1789881278328593

The parameter is called ate because in the simulated dataset, the direct effect is set to be zero.

Natural Direct Effect

Now let us check whether the direct effect estimator returns the (correct) estimate of zero.

[8]:
causal_estimate_nie = model.estimate_effect(identified_estimand_nde,
                                        method_name="mediation.two_stage_regression",
                                       confidence_intervals=False,
                                       test_significance=False,
                                        method_params = {
                                            'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
                                            'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
                                        }
                                       )
print(causal_estimate_nie)
INFO:dowhy.causal_estimator:INFO: Using Two Stage Regression Estimator
INFO:dowhy.causal_estimator:b: FD0~v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~FD0+v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
*** Causal Estimate ***

## Identified estimand
Estimand type: nonparametric-nde

### Estimand : 1
Estimand name: mediation
Estimand expression:
Expectation(Derivative(y|FD0, [v0]))
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)

## Realized estimand
(b: y~v0+W0) - ((b: FD0~v0+W0)*(b: y~FD0+v0+W0))
Target units: ate

## Estimate
Mean value: 9.684579195123888e-05

Step 4: Refutations

TODO