# Mediation analysis with DoWhy: Direct and Indirect Effects

In [None]:
import numpy as np
import pandas as pd
    
from dowhy import CausalModel
import dowhy.datasets

# Warnings and logging
import warnings
warnings.filterwarnings('ignore')

## Creating a dataset

In [None]:
# Creating a dataset with a single confounder and a single mediator (num_frontdoor_variables)
data = dowhy.datasets.linear_dataset(10, num_common_causes=1, num_samples=10000,
                                     num_instruments=0, num_effect_modifiers=0,
                                     num_treatments=1,
                                     num_frontdoor_variables=1,
                                     treatment_is_binary=False,
                                    outcome_is_binary=False)
df = data['df']
print(df.head())

## Step 1: Modeling the causal mechanism
We create a dataset following a causal graph based on the frontdoor criterion. That is, there is no direct effect of the treatment on outcome; all effect is mediated through the frontdoor variable FD0.

In [None]:
model = CausalModel(df,
                    data["treatment_name"],data["outcome_name"],
                    data["gml_graph"],
                   missing_nodes_as_confounders=True)

model.view_model()
from IPython.display import Image, display
display(Image(filename="causal_model.png"))

## Step 2: Identifying the natural direct and indirect effects
We use the `estimand_type` argument to specify that the target estimand should be for a **natural direct effect** or the **natural indirect effect**. For definitions, see [Interpretation and Identification of Causal Mediation](https://ftp.cs.ucla.edu/pub/stat_ser/r389-imai-etal-commentary-r421-reprint.pdf) by Judea Pearl.

**Natural direct effect**: Effect due to the path v0->y

**Natural indirect effect**: Effect due to the path v0->FD0->y (mediated by FD0).

In [None]:
# Natural direct effect (nde)
identified_estimand_nde = model.identify_effect(estimand_type="nonparametric-nde", 
                                            proceed_when_unidentifiable=True)
print(identified_estimand_nde)

In [None]:
# Natural indirect effect (nie)
identified_estimand_nie = model.identify_effect(estimand_type="nonparametric-nie", 
                                            proceed_when_unidentifiable=True)
print(identified_estimand_nie)

## Step 3: Estimation of the effect
Currently only two stage linear regression is supported for estimation. We plan to add a non-parametric Monte Carlo method soon as described in [Imai, Keele and Yamamoto (2010)](https://projecteuclid.org/euclid.ss/1280841733).

#### Natural Indirect Effect
The estimator converts the mediation effect estimation to a series of backdoor effect estimations. 
1. The first-stage model estimates the effect from treatment (v0) to the mediator (FD0).
2. The second-stage model estimates the effect from mediator (FD0) to the outcome (Y).

In [None]:
import dowhy.causal_estimators.linear_regression_estimator
causal_estimate_nie = model.estimate_effect(identified_estimand_nie,
                                        method_name="mediation.two_stage_regression",
                                       confidence_intervals=False,
                                       test_significance=False,
                                        method_params = {
                                            'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
                                            'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
                                        }
                                       )
print(causal_estimate_nie)

Note that the value equals the true value of the natural indirect effect (up to random noise). 

In [None]:
print(causal_estimate_nie.value, data["ate"])

The parameter is called `ate` because in the simulated dataset, the direct effect is set to be zero. 

#### Natural Direct Effect
Now let us check whether the direct effect estimator returns the (correct) estimate of zero.

In [None]:
causal_estimate_nde = model.estimate_effect(identified_estimand_nde,
                                        method_name="mediation.two_stage_regression",
                                       confidence_intervals=False,
                                       test_significance=False,
                                        method_params = {
                                            'first_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator,
                                            'second_stage_model': dowhy.causal_estimators.linear_regression_estimator.LinearRegressionEstimator
                                        }
                                       )
print(causal_estimate_nde)

## Step 4: Refutations
TODO