{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A Simple Example on Creating a Custom Refutation Using User-Defined Outcome Functions\n", "In this experiment, we define a linear dataset. In order to find the coefficients, we make use of the linear regression estimator. In order to test the effectiveness of the linear estimator, we now replace the outcome value with a dummy produced with the help of a linear expression based on the value of the confounders. This effectively means that the effect of the treatment on the outcome should be zero. This is exactly, what we should expect from the results of the refuter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Insert Dependencies" ] }, { "cell_type": "code", "execution_count": 14, "metadata": {}, "outputs": [], "source": [ "from dowhy import CausalModel\n", "import dowhy.datasets\n", "import pandas as pd\n", "import numpy as np\n", "import logging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create the Dataset\n", "You can change the values of the hyper params to see how the effects change, as each parameter changes\n", "Variable Guide:\n", "\n", "| Variable Name | Data Type | Interpretation |\n", "|-----------------|-----------|--------------------|\n", "| $Z_i$ | float | Insrument Variable |\n", "| $W_i$ | float | Confounder |\n", "| $v_0$ | float | Treatment |\n", "| $y$ | float | Outcome |\n", "\n" ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Z0W0W1v0y
00.0-1.0862650.566131-4.817235-51.000055
11.0-1.4461551.0789492.89188926.582560
21.00.3871381.05235512.759312134.107854
30.00.4401140.6280571.89223423.833082
41.0-0.116181-1.6179848.54340977.761051
\n", "
" ], "text/plain": [ " Z0 W0 W1 v0 y\n", "0 0.0 -1.086265 0.566131 -4.817235 -51.000055\n", "1 1.0 -1.446155 1.078949 2.891889 26.582560\n", "2 1.0 0.387138 1.052355 12.759312 134.107854\n", "3 0.0 0.440114 0.628057 1.892234 23.833082\n", "4 1.0 -0.116181 -1.617984 8.543409 77.761051" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Value of the coefficient [BETA]\n", "BETA = 10\n", "# Number of Common Causes\n", "NUM_COMMON_CAUSES = 2\n", "# Number of Instruments\n", "NUM_INSTRUMENTS = 1\n", "# Number of Samples\n", "NUM_SAMPLES = 100000\n", "# Treatment is Binary\n", "TREATMENT_IS_BINARY = False\n", "data = dowhy.datasets.linear_dataset(beta=BETA,\n", " num_common_causes=NUM_COMMON_CAUSES,\n", " num_instruments=NUM_INSTRUMENTS,\n", " num_samples=NUM_SAMPLES,\n", " treatment_is_binary=TREATMENT_IS_BINARY)\n", "data['df'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating the Causal Model" ] }, { "cell_type": "code", "execution_count": 15, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_model:Model to find the causal effect of treatment ['v0'] on outcome ['y']\n" ] } ], "source": [ "model = CausalModel(\n", " data = data['df'],\n", " treatment = data['treatment_name'],\n", " outcome = data['outcome_name'],\n", " graph = data['gml_graph'],\n", " instruments = data['instrument_names'],\n", " logging_level = logging.INFO\n", ")" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "WARNING:dowhy.causal_graph:Warning: Pygraphviz cannot be loaded. Check that graphviz and pygraphviz are installed.\n", "INFO:dowhy.causal_graph:Using Matplotlib for plotting\n" ] }, { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "model.view_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above figure, we have a causal graph that shows the relationships between the treatment, outcome, confounders and the instrument variable.\n", "- The Confounders $W_0$ and $W_1$ affect both the treatment and the outcome\n", "- The instrument variable $Z_0$ is able to effect the outcome $y$ through the treatment $x$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Identify the Estimand" ] }, { "cell_type": "code", "execution_count": 20, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['W1', 'W0', 'Unobserved Confounders']\n", "WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y\n" ] }, { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:['Z0']\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Estimand type: nonparametric-ate\n", "### Estimand : 1\n", "Estimand name: backdoor\n", "Estimand expression:\n", " d \n", "─────(Expectation(y|W1,W0))\n", "d[v₀] \n", "Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,U) = P(y|v0,W1,W0)\n", "### Estimand : 2\n", "Estimand name: iv\n", "Estimand expression:\n", "Expectation(Derivative(y, [Z0])*Derivative([v0], [Z0])**(-1))\n", "Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0})\n", "Estimand assumption 2, Exclusion: If we remove {Z0}→{v0}, then ¬({Z0}→y)\n", "\n" ] } ], "source": [ "identified_estimand = model.identify_effect()\n", "print(identified_estimand)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimating the Effect" ] }, { "cell_type": "code", "execution_count": 22, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator\n", "INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator\n", "Realized estimand type: nonparametric-ate\n", "Estimand expression:\n", " -1\n", "Expectation(Derivative(y, Z0))⋅Expectation(Derivative(v0, Z0)) \n", "Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0})\n", "Estimand assumption 2, Exclusion: If we remove {Z0}→{v0}, then ¬({Z0}→y)\n", "Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['v0'] is affected in the same way by common causes of ['v0'] and y\n", "Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome y is affected in the same way by common causes of ['v0'] and y\n", "\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "*** Causal Estimate ***\n", "\n", "## Target estimand\n", "Estimand type: nonparametric-ate\n", "### Estimand : 1\n", "Estimand name: backdoor\n", "Estimand expression:\n", " d \n", "─────(Expectation(y|W1,W0))\n", "d[v₀] \n", "Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,U) = P(y|v0,W1,W0)\n", "### Estimand : 2\n", "Estimand name: iv\n", "Estimand expression:\n", "Expectation(Derivative(y, [Z0])*Derivative([v0], [Z0])**(-1))\n", "Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0})\n", "Estimand assumption 2, Exclusion: If we remove {Z0}→{v0}, then ¬({Z0}→y)\n", "\n", "## Realized estimand\n", "Realized estimand: Wald Estimator\n", "Realized estimand type: nonparametric-ate\n", "Estimand expression:\n", " -1\n", "Expectation(Derivative(y, Z0))⋅Expectation(Derivative(v0, Z0)) \n", "Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0})\n", "Estimand assumption 2, Exclusion: If we remove {Z0}→{v0}, then ¬({Z0}→y)\n", "Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['v0'] is affected in the same way by common causes of ['v0'] and y\n", "Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome y is affected in the same way by common causes of ['v0'] and y\n", "\n", "## Estimate\n", "Value: 10.006229589077737\n", "\n" ] } ], "source": [ "causal_estimate = model.estimate_effect( identified_estimand,\n", " method_name=\"iv.instrumental_variable\",\n", " method_params={'iv_instrument_name':'Z0'}\n", " )\n", "print(causal_estimate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Refuting the Estimate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using a Randomly Generated Value" ] }, { "cell_type": "code", "execution_count": 24, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_refuters.dummy_outcome_refuter:Refutation over 100 simulated datasets of Random Data treatment\n", "INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator\n", "INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator\n", "Realized estimand type: nonparametric-ate\n", "Estimand expression:\n", " -1\n", "Expectation(Derivative(dummy_outcome, Z0))⋅Expectation(Derivative(v0, Z0)) \n", "Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0})\n", "Estimand assumption 2, Exclusion: If we remove {Z0}→{v0}, then ¬({Z0}→y)\n", "Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['v0'] is affected in the same way by common causes of ['v0'] and dummy_outcome\n", "Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome dummy_outcome is affected in the same way by common causes of ['v0'] and dummy_outcome\n", "\n", "INFO:dowhy.causal_refuters.dummy_outcome_refuter:Making use of Bootstrap as we have more than 100 examples.\n", " Note: The greater the number of examples, the more accurate are the confidence estimates\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Refute: Use a Dummy Outcome\n", "Estimated effect:10.006229589077737\n", "New effect:2.0838738761455298e-06\n", "p value:0.99\n", "\n" ] } ], "source": [ "ref = model.refute_estimate(identified_estimand,\n", " causal_estimate,\n", " method_name=\"dummy_outcome_refuter\"\n", " )\n", "print(ref)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result shows that the treatment does not lead to the outcome. The esitimated outcome $2x10^{-6}$ is a value that tends to zero, which matches our expectation. This shows that if we replace the outcome by randomly generated data, the estimator correctly predicts that the influence if treatment is Zero." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using a Function that Generates the Outcome from the Confounders" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us define a simple function that generates the outcome as a linear function of the confounders." ] }, { "cell_type": "code", "execution_count": 26, "metadata": {}, "outputs": [], "source": [ "coefficients = np.array([1,2])\n", "bias = 3\n", "def linear_gen(df):\n", " y_new = np.dot(df[['W0','W1']].values,coefficients) + 3\n", " return y_new" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The basic expression is of the form\n", "$y_{new} = \\beta_0W_0 + \\beta_1W_1 + \\gamma_0$\n", "\n", "where,\n", "$\\beta_0=1$, $\\beta_1=2$ and $\\gamma_0=3$" ] }, { "cell_type": "code", "execution_count": 28, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ "INFO:dowhy.causal_refuters.dummy_outcome_refuter:Refutation over 100 simulated datasets of Random Data treatment\n", "INFO:dowhy.causal_estimator:INFO: Using Instrumental Variable Estimator\n", "INFO:dowhy.causal_estimator:Realized estimand: Wald Estimator\n", "Realized estimand type: nonparametric-ate\n", "Estimand expression:\n", " -1\n", "Expectation(Derivative(dummy_outcome, Z0))⋅Expectation(Derivative(v0, Z0)) \n", "Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0})\n", "Estimand assumption 2, Exclusion: If we remove {Z0}→{v0}, then ¬({Z0}→y)\n", "Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['v0'] is affected in the same way by common causes of ['v0'] and dummy_outcome\n", "Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome dummy_outcome is affected in the same way by common causes of ['v0'] and dummy_outcome\n", "\n", "INFO:dowhy.causal_refuters.dummy_outcome_refuter:Making use of Bootstrap as we have more than 100 examples.\n", " Note: The greater the number of examples, the more accurate are the confidence estimates\n" ] }, { "name": "stdout", "output_type": "stream", "text": [ "Refute: Use a Dummy Outcome\n", "Estimated effect:10.006229589077737\n", "New effect:-5.262651999486695e-06\n", "p value:1.0\n", "\n" ] } ], "source": [ "ref = model.refute_estimate(identified_estimand,\n", " causal_estimate,\n", " method_name=\"dummy_outcome_refuter\",\n", " outcome_function=linear_gen\n", " )\n", "\n", "print(ref)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like the previous experiment, we observe that the estimator shows that the effect of the treatment is Zero. The refuter confirms this as the value obtained through the refutation is $5x10^{-6}$ and has a p value of 1.0 across 100 simulations." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }