{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# A Simple Example on Creating a Custom Refutation Using User-Defined Outcome Functions\n", "In this experiment, we define a linear dataset. In order to find the coefficients, we make use of the linear regression estimator. In order to test the effectiveness of the linear estimator, we now replace the outcome value with a dummy produced with the help of a linear expression based on the value of the confounders. This effectively means that the effect of the treatment on the outcome should be zero. This is exactly, what we should expect from the results of the refuter." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Insert Dependencies" ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "from dowhy import CausalModel\n", "import dowhy.datasets\n", "import pandas as pd\n", "import numpy as np\n", "\n", "# Config dict to set the logging level\n", "import logging.config\n", "DEFAULT_LOGGING = {\n", " 'version': 1,\n", " 'disable_existing_loggers': False,\n", " 'loggers': {\n", " '': {\n", " 'level': 'WARN',\n", " },\n", " }\n", "}\n", "\n", "logging.config.dictConfig(DEFAULT_LOGGING)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Create the Dataset\n", "You can change the values of the hyper params to see how the effects change, as each parameter changes\n", "Variable Guide:\n", "\n", "| Variable Name | Data Type | Interpretation |\n", "|-----------------|-----------|--------------------|\n", "| $Z_i$ | float | Insrument Variable |\n", "| $W_i$ | float | Confounder |\n", "| $v_0$ | float | Treatment |\n", "| $y$ | float | Outcome |\n", "\n" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Z0W0W1v0y
01.00.112689-0.5014748.07657480.106461
10.00.645347-0.072829-0.219279-0.092377
20.00.3234800.9898250.3659476.900517
30.00.0304371.3344231.74052420.319910
41.01.3778410.62839711.938058125.523936
\n", "
" ], "text/plain": [ " Z0 W0 W1 v0 y\n", "0 1.0 0.112689 -0.501474 8.076574 80.106461\n", "1 0.0 0.645347 -0.072829 -0.219279 -0.092377\n", "2 0.0 0.323480 0.989825 0.365947 6.900517\n", "3 0.0 0.030437 1.334423 1.740524 20.319910\n", "4 1.0 1.377841 0.628397 11.938058 125.523936" ] }, "execution_count": 2, "metadata": {}, "output_type": "execute_result" } ], "source": [ "# Value of the coefficient [BETA]\n", "BETA = 10\n", "# Number of Common Causes\n", "NUM_COMMON_CAUSES = 2\n", "# Number of Instruments\n", "NUM_INSTRUMENTS = 1\n", "# Number of Samples\n", "NUM_SAMPLES = 100000\n", "# Treatment is Binary\n", "TREATMENT_IS_BINARY = False\n", "data = dowhy.datasets.linear_dataset(beta=BETA,\n", " num_common_causes=NUM_COMMON_CAUSES,\n", " num_instruments=NUM_INSTRUMENTS,\n", " num_samples=NUM_SAMPLES,\n", " treatment_is_binary=TREATMENT_IS_BINARY)\n", "data['df'].head()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Creating the Causal Model" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "model = CausalModel(\n", " data = data['df'],\n", " treatment = data['treatment_name'],\n", " outcome = data['outcome_name'],\n", " graph = data['gml_graph'],\n", " instruments = data['instrument_names']\n", ")" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "model.view_model()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the above figure, we have a causal graph that shows the relationships between the treatment, outcome, confounders and the instrument variable.\n", "- The Confounders $W_0$ and $W_1$ affect both the treatment and the outcome\n", "- The instrument variable $Z_0$ is able to effect the outcome $y$ through the treatment $x$" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Identify the Estimand" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Estimand type: nonparametric-ate\n", "\n", "### Estimand : 1\n", "Estimand name: backdoor\n", "Estimand expression:\n", " d \n", "─────(Expectation(y|W1,W0))\n", "d[v₀] \n", "Estimand assumption 1, Unconfoundedness: If U→{v0} and U→y then P(y|v0,W1,W0,U) = P(y|v0,W1,W0)\n", "\n", "### Estimand : 2\n", "Estimand name: iv\n", "Estimand expression:\n", "Expectation(Derivative(y, [Z0])*Derivative([v0], [Z0])**(-1))\n", "Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0})\n", "Estimand assumption 2, Exclusion: If we remove {Z0}→{v0}, then ¬({Z0}→y)\n", "\n", "### Estimand : 3\n", "Estimand name: frontdoor\n", "No such variable found!\n", "\n" ] } ], "source": [ "identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)\n", "print(identified_estimand)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Estimating the Effect" ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "scrolled": true }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "*** Causal Estimate ***\n", "\n", "## Identified estimand\n", "Estimand type: nonparametric-ate\n", "\n", "### Estimand : 1\n", "Estimand name: iv\n", "Estimand expression:\n", "Expectation(Derivative(y, [Z0])*Derivative([v0], [Z0])**(-1))\n", "Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0})\n", "Estimand assumption 2, Exclusion: If we remove {Z0}→{v0}, then ¬({Z0}→y)\n", "\n", "## Realized estimand\n", "Realized estimand: Wald Estimator\n", "Realized estimand type: nonparametric-ate\n", "Estimand expression:\n", " -1\n", "Expectation(Derivative(y, Z0))⋅Expectation(Derivative(v0, Z0)) \n", "Estimand assumption 1, As-if-random: If U→→y then ¬(U →→{Z0})\n", "Estimand assumption 2, Exclusion: If we remove {Z0}→{v0}, then ¬({Z0}→y)\n", "Estimand assumption 3, treatment_effect_homogeneity: Each unit's treatment ['v0'] is affected in the same way by common causes of ['v0'] and y\n", "Estimand assumption 4, outcome_effect_homogeneity: Each unit's outcome y is affected in the same way by common causes of ['v0'] and y\n", "\n", "Target units: ate\n", "\n", "## Estimate\n", "Mean value: 9.99706705820163\n", "\n" ] } ], "source": [ "causal_estimate = model.estimate_effect( identified_estimand,\n", " method_name=\"iv.instrumental_variable\",\n", " method_params={'iv_instrument_name':'Z0'}\n", " )\n", "print(causal_estimate)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Refuting the Estimate" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using a Randomly Generated Outcome" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Refute: Use a Dummy Outcome\n", "Estimated effect:0\n", "New effect:4.657654751543888e-05\n", "p value:0.49\n", "\n" ] } ], "source": [ "ref = model.refute_estimate(identified_estimand,\n", " causal_estimate,\n", " method_name=\"dummy_outcome_refuter\"\n", " )\n", "print(ref[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The result shows that the treatment does not lead to the outcome. The estimated effect is a value that tends to zero, which matches our expectation. This shows that if we replace the outcome by randomly generated data, the estimator correctly predicts that the influence if treatment is Zero." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Using a Function that Generates the Outcome from the Confounders" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let us define a simple function that generates the outcome as a linear function of the confounders." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [], "source": [ "coefficients = np.array([1,2])\n", "bias = 3\n", "def linear_gen(df):\n", " y_new = np.dot(df[['W0','W1']].values,coefficients) + 3\n", " return y_new" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The basic expression is of the form\n", "$y_{new} = \\beta_0W_0 + \\beta_1W_1 + \\gamma_0$\n", "\n", "where,\n", "$\\beta_0=1$, $\\beta_1=2$ and $\\gamma_0=3$" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Refute: Use a Dummy Outcome\n", "Estimated effect:0\n", "New effect:-1.1692081553648758e-05\n", "p value:0.47\n", "\n" ] } ], "source": [ "ref = model.refute_estimate(identified_estimand,\n", " causal_estimate,\n", " method_name=\"dummy_outcome_refuter\",\n", " outcome_function=linear_gen\n", " )\n", "\n", "print(ref[0])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Like the previous experiment, we observe that the estimator shows that the effect of the treatment is Zero. The refuter confirms this as the value obtained through the refutation is quite low and has a p value of >0.05 across 100 simulations." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.5" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 4 }