Mediation analysis with DoWhy: Direct and Indirect Effects

[1]:
import numpy as np
import pandas as pd

from dowhy import CausalModel
import dowhy.datasets

# Warnings and logging
import warnings
warnings.filterwarnings('ignore')
import logging
logging.getLogger("dowhy").setLevel(logging.INFO)

Creating a dataset

[2]:
# Creating a dataset with a single confounder and a single mediator (num_frontdoor_variables)
data = dowhy.datasets.linear_dataset(10, num_common_causes=1, num_samples=10000,
                                     num_instruments=0, num_effect_modifiers=0,
                                     num_treatments=1,
                                     num_frontdoor_variables=1,
                                     treatment_is_binary=False,
                                    outcome_is_binary=False)
df = data['df']
print(df.head())
         FD0        W0        v0          y
0  11.009406  0.521425  2.654356  33.373490
1  -6.848972 -0.637748 -1.947278 -22.097380
2  -0.710087 -0.192628 -0.120618  -2.832689
3  -0.092251 -0.314083 -0.229222  -1.640303
4  13.049299  1.138318  3.028420  41.798294

Step 1: Modeling the causal mechanism

We create a dataset following a causal graph based on the frontdoor criterion. That is, there is no direct effect of the treatment on outcome; all effect is mediated through the frontdoor variable FD0.

[3]:
model = CausalModel(df,
                    data["treatment_name"],data["outcome_name"],
                    data["gml_graph"],
                   missing_nodes_as_confounders=True,
                   logging_level=logging.INFO)

model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))
INFO:dowhy.causal_model:Model to find the causal effect of treatment ['v0'] on outcome ['y']
../_images/example_notebooks_dowhy_mediation_analysis_5_1.png

Step 2: Identifying the natural direct and indirect effects

We use the estimand_type argument to specify that the target estimand should be for a natural direct effect or the natural indirect effect. For definitions, see Interpretation and Identification of Causal Mediation by Judea Pearl.

Natural direct effect: Effect due to the path v0->y Natural indirect effect: Effece due to the path v0->FD0->y (mediated by FD0).

[4]:
# Natural direct effect (nde)
identified_estimand_nde = model.identify_effect(estimand_type="nonparametric-nde",
                                            proceed_when_unidentifiable=True)
print(identified_estimand_nde)
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
INFO:dowhy.causal_identifier:Continuing by ignoring these unobserved confounders because proceed_when_unidentifiable flag is True.
INFO:dowhy.causal_identifier:Mediators for treatment and outcome:['FD0']
INFO:dowhy.causal_identifier:All common causes are observed. Causal effect can be identified.
INFO:dowhy.causal_identifier:All common causes are observed. Causal effect can be identified.
Estimand type: nonparametric-nde

### Estimand : 1
Estimand name: mediation
Estimand expression:
Expectation(Derivative(y, [FD0])*Derivative([FD0], [v0]))
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)

[5]:
# Natural indirect effect (nie)
identified_estimand_nie = model.identify_effect(estimand_type="nonparametric-nie",
                                            proceed_when_unidentifiable=True)
print(identified_estimand_nie)
WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.
INFO:dowhy.causal_identifier:Continuing by ignoring these unobserved confounders because proceed_when_unidentifiable flag is True.
INFO:dowhy.causal_identifier:Mediators for treatment and outcome:['FD0']
INFO:dowhy.causal_identifier:All common causes are observed. Causal effect can be identified.
INFO:dowhy.causal_identifier:All common causes are observed. Causal effect can be identified.
Estimand type: nonparametric-nie

### Estimand : 1
Estimand name: mediation
Estimand expression:

Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)

Step 3: Estimation of the effect

Currently only two stage linear regression is supported for estimation. We plan to add a non-parametric Monte Carlo method soon as described in Imai, Keele and Yamamoto (2010).

The estimator converts the mediation effect estimation to a series of backdoor effect estimations. 1. The first-stage model estimates the effect from treatment (v0) to the mediator (FD0). 2. The second-stage model estimates the effect from mediator (FD0) to the outcome (Y).

For estimating the natural indirect effect, there is also an additional second-stage model that estimates the effect of treatment on the outcome, conditioned on the mediator. It assumes the same model as given for for the second_stage_model parameter.

[6]:
import dowhy.causal_estimators.linear_regression_estimator
causal_estimate_nde = model.estimate_effect(identified_estimand_nde,
                                        method_name="mediation.two_stage_regression",
                                       confidence_intervals=False,
                                       test_significance=False,
                                        method_params = {
                                            'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
                                            'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
                                        }
                                       )
print(causal_estimate_nde)
INFO:dowhy.causal_estimator:INFO: Using Two Stage Regression Estimator
INFO:dowhy.causal_estimator:b: FD0~v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~FD0+v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
*** Causal Estimate ***

## Identified estimand
Estimand type: nonparametric-nde

### Estimand : 1
Estimand name: mediation
Estimand expression:
Expectation(Derivative(y, [FD0])*Derivative([FD0], [v0]))
Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)

## Realized estimand
(b: FD0~v0+W0) * (b: y~FD0+v0+W0)
Target units: ate

## Estimate
Mean value: 11.704233518070275

Note that the value equals the true value of the natural direct effect (up to random noise).

[7]:
print(causal_estimate_nde.value, data["ate"])
11.704233518070275 11.712215459759832

The parameter is called ate because in the simulated dataset, the indirect effect is set to be zero. Now let us check whether the indirect effect estimator returns the (correct) estimate of zero.

[8]:
causal_estimate_nie = model.estimate_effect(identified_estimand_nie,
                                        method_name="mediation.two_stage_regression",
                                       confidence_intervals=False,
                                       test_significance=False,
                                        method_params = {
                                            'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
                                            'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
                                        }
                                       )
print(causal_estimate_nie)
INFO:dowhy.causal_estimator:INFO: Using Two Stage Regression Estimator
INFO:dowhy.causal_estimator:b: FD0~v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~FD0+v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
INFO:dowhy.causal_estimator:b: y~v0+W0
INFO:dowhy.causal_estimator:INFO: Using Linear Regression Estimator
*** Causal Estimate ***

## Identified estimand
Estimand type: nonparametric-nie

### Estimand : 1
Estimand name: mediation
Estimand expression:

Estimand assumption 1, Mediation: FD0 intercepts (blocks) all directed paths from v0 to y except the path {v0}→{y}.
Estimand assumption 2, First-stage-unconfoundedness: If U→{v0} and U→{FD0} then P(FD0|v0,U) = P(FD0|v0)
Estimand assumption 3, Second-stage-unconfoundedness: If U→{FD0} and U→{y} then P(y|FD0, v0, U) = P(y|FD0, v0)

## Realized estimand
b: y~v0+W0-(b: FD0~v0+W0) * (b: y~FD0+v0+W0)
Target units: ate

## Estimate
Mean value: 0.000848600067884675

Step 4: Refutations

TODO