{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "## Sensitivity Analysis for Linear Regression Models\n", "Sensitivity analysis helps us examine how sensitive a result is against the possibility of unobserved confounding. The current method only supports linear regression estimator.
\n", "The partial R^2 of treatment with outcome shows how strongly confounders explaining all the residual outcome variation would have to be associated with the treatment to eliminate the estimated effect.
\n", "The robustness value measures the minimum strength of association unobserved confounding should have with both treatment and outcome in order to change the conclusions.
\n", "Robustness value close to 1 means the treatment effect can handle strong confounders explaining almost all residual variation of the treatment and the outcome.
\n", "Robustness value close to 0 means that even very weak confounders can also change the results.
\n", "Benchmarking examines the sensitivity of causal inferences to plausible strengths of the omitted confounders.
\n", "This method is based on https://carloscinelli.com/files/Cinelli%20and%20Hazlett%20(2020)%20-%20Making%20Sense%20of%20Sensitivity.pdf " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 1: Load required packages" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os, sys\n", "sys.path.append(os.path.abspath(\"../../../\"))\n", "import dowhy\n", "from dowhy import CausalModel\n", "import pandas as pd\n", "import numpy as np\n", "import dowhy.datasets \n", "\n", "# Config dict to set the logging level\n", "import logging.config\n", "DEFAULT_LOGGING = {\n", " 'version': 1,\n", " 'disable_existing_loggers': False,\n", " 'loggers': {\n", " '': {\n", " 'level': 'ERROR',\n", " },\n", " }\n", "}\n", "\n", "logging.config.dictConfig(DEFAULT_LOGGING)\n", "# Disabling warnings output\n", "import warnings\n", "from sklearn.exceptions import DataConversionWarning\n", "#warnings.filterwarnings(action='ignore', category=DataConversionWarning)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 2: Load the dataset \n", "We create a dataset with linear relationships between common causes and treatment, and common causes and outcome. Beta is the true causal effect." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(100) \n", "data = dowhy.datasets.linear_dataset( beta = 10,\n", " num_common_causes = 7,\n", " num_samples = 500,\n", " num_treatments = 1,\n", " stddev_treatment_noise =10,\n", " stddev_outcome_noise = 5\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = CausalModel(\n", " data=data[\"df\"],\n", " treatment=data[\"treatment_name\"],\n", " outcome=data[\"outcome_name\"],\n", " graph=data[\"gml_graph\"],\n", " test_significance=None,\n", " )\n", "model.view_model()\n", "from IPython.display import Image, display\n", "display(Image(filename=\"causal_model.png\"))\n", "data['df'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 3: Create Causal Model\n", "Remove one of the common causes to simulate unobserved confounding" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "data[\"df\"] = data[\"df\"].drop(\"W4\", axis = 1)\n", "graph_str = 'graph[directed 1node[ id \"y\" label \"y\"]node[ id \"W0\" label \"W0\"] node[ id \"W1\" label \"W1\"] node[ id \"W2\" label \"W2\"] node[ id \"W3\" label \"W3\"] node[ id \"W5\" label \"W5\"] node[ id \"W6\" label \"W6\"]node[ id \"v0\" label \"v0\"]edge[source \"v0\" target \"y\"]edge[ source \"W0\" target \"v0\"] edge[ source \"W1\" target \"v0\"] edge[ source \"W2\" target \"v0\"] edge[ source \"W3\" target \"v0\"] edge[ source \"W5\" target \"v0\"] edge[ source \"W6\" target \"v0\"]edge[ source \"W0\" target \"y\"] edge[ source \"W1\" target \"y\"] edge[ source \"W2\" target \"y\"] edge[ source \"W3\" target \"y\"] edge[ source \"W5\" target \"y\"] edge[ source \"W6\" target \"y\"]]'\n", "model = CausalModel(\n", " data=data[\"df\"],\n", " treatment=data[\"treatment_name\"],\n", " outcome=data[\"outcome_name\"],\n", " graph=graph_str,\n", " test_significance=None,\n", " )\n", "model.view_model()\n", "from IPython.display import Image, display\n", "display(Image(filename=\"causal_model.png\"))\n", "data['df'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 4: Identification" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)\n", "print(identified_estimand)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 5: Estimation\n", "Currently only Linear Regression estimator is supported for Linear Sensitivity Analysis" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "estimate = model.estimate_effect(identified_estimand,method_name=\"backdoor.linear_regression\")\n", "print(estimate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Step 6: Refutation and Sensitivity Analysis\n", "identified_estimand: An instance of the identifiedEstimand class that provides the information with respect to which causal pathways are employed when the treatment effects the outcome
\n", "estimate: An instance of CausalEstimate class. The estimate obtained from the estimator for the original data.
\n", "method_name: Refutation method name
\n", "simulated_method_name: \"linear-partial-R2\" for Linear Sensitivity Analysis
\n", "benchmark_common_causes: Name of the covariates used to bound the strengths of unobserved confounder
\n", "percent_change_estimate: It is the percentage of reduction of treatment estimate that could alter the results (default = 1) if percent_change_estimate = 1, the robustness value describes the strength of association of confounders with treatment and outcome in order to reduce the estimate by 100% i.e bring it down to 0.
\n", "confounder_increases_estimate: confounder_increases_estimate = True implies that confounder increases the absolute value of estimate and vice versa. Default is confounder_increases_estimate = False i.e. the considered confounders pull estimate towards zero
\n", "effect_fraction_on_treatment: Strength of association between unobserved confounder and treatment compared to benchmark covariate
\n", "effect_fraction_on_outcome: Strength of association between unobserved confounder and outcome compared to benchmark covariate
\n", "null_hypothesis_effect: assumed effect under the null hypothesis (default = 0)
\n", "plot_estimate: Generate contour plot for estimate while performing sensitivity analysis. (default = True). \n", " To override the setting, set plot_estimate = False." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": false }, "outputs": [], "source": [ "refute = model.refute_estimate(identified_estimand, estimate ,\n", " method_name = \"add_unobserved_common_cause\",\n", " simulated_method_name = \"linear-partial-R2\", \n", " benchmark_common_causes = [\"W3\"],\n", " effect_fraction_on_treatment = [ 1,2,3]\n", " )" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The x axis shows hypothetical partial R2 values of unobserved confounder(s) with the treatment. The y axis shows hypothetical partial R2 of unobserved confounder(s) with the outcome.
\n", "The contour levels represent adjusted t-values or estimates for unobserved confounders with hypothetical partialR2 values when these would be included in full regression model.
\n", "The red line is the critical threshold: confounders with such strength or stronger are sufficient to invalidate the research conclusions." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "refute.stats" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "refute.benchmarking_results" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Parameter List for plot function\n", "plot_type: \"estimate\" or \"t-value\"
\n", "critical_value: special reference value of the estimate or t-value that will be highlighted in the plot
\n", "x_limit: plot's maximum x_axis value (default = 0.8)
\n", "y_limit: plot's minimum y_axis value (default = 0.8)
\n", "num_points_per_contour: number of points to calculate and plot each contour line (default = 200)
\n", "plot_size: tuple denoting the size of the plot (default = (7,7))
\n", "contours_color: color of contour line (default = blue)
\n", " String or array. If array, lines will be plotted with the specific color in ascending order.
\n", " critical_contour_color: color of threshold line (default = red)
\n", " label_fontsize: fontsize for labelling contours (default = 9)
\n", " contour_linewidths: linewidths for contours (default = 0.75)
\n", " contour_linestyles: linestyles for contours (default = \"solid\")\n", " See : https://matplotlib.org/3.5.0/gallery/lines_bars_and_markers/linestyles.html
\n", " contours_label_color: color of contour line label (default = black)
\n", " critical_label_color: color of threshold line label (default = red)
\n", " unadjusted_estimate_marker: marker type for unadjusted estimate in the plot (default = 'D')\n", " See: https://matplotlib.org/stable/api/markers_api.html
unadjusted_estimate_color: marker color for unadjusted estimate in the plot (default = \"black\")
\n", " adjusted_estimate_marker: marker type for bias adjusted estimates in the plot (default = '^')
adjusted_estimate_color: marker color for bias adjusted estimates in the plot (default = \"red\")
\n", " legend_position:tuple denoting the position of the legend (default = (1.6, 0.6))
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "refute.plot(plot_type = 't-value')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The t statistic is the coefficient divided by its standard error. The higher the t-value, the greater the evidence to reject the null hypothesis.
\n", "According to the above plot,at 5% significance level, the null hypothesis of zero effect would be rejected given the above confounders. " ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "print(refute)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Sensitivity Analysis for dataset with no confounders\n", "We now run the sensitivity analysis for the same dataset but without dropping any variable.
\n", "We get a robustness value goes from 0.55 to 0.95 which means that treatment effect can handle strong confounders explaining almost all residual variation of the treatment and the outcome.
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "np.random.seed(100) \n", "data = dowhy.datasets.linear_dataset( beta = 10,\n", " num_common_causes = 7,\n", " num_samples = 500,\n", " num_treatments = 1,\n", " stddev_treatment_noise=10,\n", " stddev_outcome_noise = 1\n", " )" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model = CausalModel(\n", " data=data[\"df\"],\n", " treatment=data[\"treatment_name\"],\n", " outcome=data[\"outcome_name\"],\n", " graph=data[\"gml_graph\"],\n", " test_significance=None,\n", " )\n", "model.view_model()\n", "from IPython.display import Image, display\n", "display(Image(filename=\"causal_model.png\"))\n", "data['df'].head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)\n", "print(identified_estimand)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "estimate = model.estimate_effect(identified_estimand,method_name=\"backdoor.linear_regression\")\n", "print(estimate)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "refute = model.refute_estimate(identified_estimand, estimate ,\n", " method_name = \"add_unobserved_common_cause\",\n", " simulated_method_name = \"linear-partial-R2\", \n", " benchmark_common_causes = [\"W3\"],\n", " effect_fraction_on_treatment = [ 1,2,3])" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "refute.plot(plot_type = 't-value')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "print(refute)" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "scrolled": true }, "outputs": [], "source": [ "refute.stats" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "refute.benchmarking_results" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.6.10" } }, "nbformat": 4, "nbformat_minor": 4 }