{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# A Simple Example on Creating a Custom Refutation Using User-Defined Outcome Functions\n",
    "In this experiment, we define a linear dataset. In order to find the coefficients, we make use of the linear regression estimator. In order to test the effectiveness of the linear estimator, we now replace the outcome value with a dummy produced with the help of a linear expression based on the value of the confounders. This effectively means that the effect of the treatment on the outcome should be zero. This is exactly, what we should expect from the results of the refuter."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Insert Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dowhy import CausalModel\n",
    "import dowhy.datasets\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "\n",
    "# Config dict to set the logging level\n",
    "import logging.config\n",
    "DEFAULT_LOGGING = {\n",
    "    'version': 1,\n",
    "    'disable_existing_loggers': False,\n",
    "    'loggers': {\n",
    "        '': {\n",
    "            'level': 'WARN',\n",
    "        },\n",
    "    }\n",
    "}\n",
    "\n",
    "logging.config.dictConfig(DEFAULT_LOGGING)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create the Dataset\n",
    "You can change the values of the hyper params to see how the effects change, as each parameter changes\n",
    "Variable Guide:\n",
    "\n",
    "| Variable Name   | Data Type |  Interpretation    |\n",
    "|-----------------|-----------|--------------------|\n",
    "|   $Z_i$         |  float    | Insrument Variable |\n",
    "|   $W_i$         |  float    | Confounder         |\n",
    "|   $v_0$         |  float    | Treatment          |\n",
    "|    $y$          |  float    | Outcome            |\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Value of the coefficient [BETA]\n",
    "BETA = 10\n",
    "# Number of Common Causes\n",
    "NUM_COMMON_CAUSES = 2\n",
    "# Number of Instruments\n",
    "NUM_INSTRUMENTS = 1\n",
    "# Number of Samples\n",
    "NUM_SAMPLES = 100000\n",
    "# Treatment is Binary\n",
    "TREATMENT_IS_BINARY = False\n",
    "data = dowhy.datasets.linear_dataset(beta=BETA,\n",
    "                                 num_common_causes=NUM_COMMON_CAUSES,\n",
    "                                 num_instruments=NUM_INSTRUMENTS,\n",
    "                                 num_samples=NUM_SAMPLES,\n",
    "                                 treatment_is_binary=TREATMENT_IS_BINARY)\n",
    "data['df'].head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Creating the Causal Model"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model = CausalModel(\n",
    "    data = data['df'],\n",
    "    treatment = data['treatment_name'],\n",
    "    outcome = data['outcome_name'],\n",
    "    graph = data['gml_graph'],\n",
    "    instruments = data['instrument_names']\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "model.view_model()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In the above figure, we have a causal graph that shows the relationships between the treatment, outcome, confounders and the instrument variable.\n",
    "- The Confounders $W_0$ and $W_1$ affect both the treatment and the outcome\n",
    "- The instrument variable $Z_0$ is able to effect the outcome $y$ through the treatment $x$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Identify the Estimand"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)\n",
    "print(identified_estimand)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Estimating the Effect"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
    "causal_estimate = model.estimate_effect( identified_estimand,\n",
    "                                       method_name=\"iv.instrumental_variable\",\n",
    "                                       method_params={'iv_instrument_name':'Z0'}\n",
    "                                       )\n",
    "print(causal_estimate)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Refuting the Estimate"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using a Randomly Generated Outcome"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ref = model.refute_estimate(identified_estimand,\n",
    "                           causal_estimate,\n",
    "                           method_name=\"dummy_outcome_refuter\"\n",
    "                           )\n",
    "print(ref[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The result shows that the treatment does not lead to the outcome. The estimated effect is a value that tends to zero, which matches our expectation. This shows that if we replace the outcome by randomly generated data, the estimator correctly predicts that the influence if treatment is Zero."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Using a Function that Generates the Outcome from the Confounders"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let us define a simple function that generates the outcome as a linear function of the confounders."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "coefficients = np.array([1,2])\n",
    "bias = 3\n",
    "def linear_gen(df):\n",
    "    y_new = np.dot(df[['W0','W1']].values,coefficients) + 3\n",
    "    return y_new"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The basic expression is of the form\n",
    "$y_{new} = \\beta_0W_0 + \\beta_1W_1 + \\gamma_0$\n",
    "\n",
    "where,\n",
    "$\\beta_0=1$, $\\beta_1=2$ and $\\gamma_0=3$"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "ref = model.refute_estimate(identified_estimand,\n",
    "                           causal_estimate,\n",
    "                           method_name=\"dummy_outcome_refuter\",\n",
    "                           outcome_function=linear_gen\n",
    "                           )\n",
    "\n",
    "print(ref[0])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Like the previous experiment, we observe that the estimator shows that the effect of the treatment is Zero. The refuter confirms this as the value obtained through the refutation is quite low and has a p value of >0.05 across 100 simulations."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}