{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Applying refutation tests to the Lalonde and IHDP datasets"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import the Dependencies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "import dowhy\n",
    "from dowhy import CausalModel\n",
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Loading the Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Infant Health and Development Program Dataset (IHDP)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "The measurements used are on the child—birth weight, head circumference, weeks bornpreterm, birth order, ﬁrst born, neonatal health index (see Scott and Bauer 1989), sex, twinstatus—as well as behaviors engaged in during the pregnancy—smoked cigarettes, drankalcohol, took drugs—and measurements on the mother at the time she gave birth—age,marital status, educational attainment (did not graduate from high school, graduated fromhigh school, attended some college but did not graduate, graduated from college), whethershe worked during pregnancy, whether she received prenatal care—and the site (8 total) inwhich the family resided at the start of the intervention. There are 6 continuous covariatesand 19 binary covariates.\n",
    "\n",
    "### Reference\n",
    "Hill, J. L. (2011). Bayesian nonparametric modeling for causal inference. Journal of Computational and Graphical Statistics, 20(1), 217-240. https://doi.org/10.1198/jcgs.2010.08162"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>treatment</th>\n",
       "      <th>y_factual</th>\n",
       "      <th>y_cfactual</th>\n",
       "      <th>mu0</th>\n",
       "      <th>mu1</th>\n",
       "      <th>x1</th>\n",
       "      <th>x2</th>\n",
       "      <th>x3</th>\n",
       "      <th>x4</th>\n",
       "      <th>x5</th>\n",
       "      <th>...</th>\n",
       "      <th>x16</th>\n",
       "      <th>x17</th>\n",
       "      <th>x18</th>\n",
       "      <th>x19</th>\n",
       "      <th>x20</th>\n",
       "      <th>x21</th>\n",
       "      <th>x22</th>\n",
       "      <th>x23</th>\n",
       "      <th>x24</th>\n",
       "      <th>x25</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>True</td>\n",
       "      <td>5.599916</td>\n",
       "      <td>4.318780</td>\n",
       "      <td>3.268256</td>\n",
       "      <td>6.854457</td>\n",
       "      <td>-0.528603</td>\n",
       "      <td>-0.343455</td>\n",
       "      <td>1.128554</td>\n",
       "      <td>0.161703</td>\n",
       "      <td>-0.316603</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>False</td>\n",
       "      <td>6.875856</td>\n",
       "      <td>7.856495</td>\n",
       "      <td>6.636059</td>\n",
       "      <td>7.562718</td>\n",
       "      <td>-1.736945</td>\n",
       "      <td>-1.802002</td>\n",
       "      <td>0.383828</td>\n",
       "      <td>2.244320</td>\n",
       "      <td>-0.629189</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>False</td>\n",
       "      <td>2.996273</td>\n",
       "      <td>6.633952</td>\n",
       "      <td>1.570536</td>\n",
       "      <td>6.121617</td>\n",
       "      <td>-0.807451</td>\n",
       "      <td>-0.202946</td>\n",
       "      <td>-0.360898</td>\n",
       "      <td>-0.879606</td>\n",
       "      <td>0.808706</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>False</td>\n",
       "      <td>1.366206</td>\n",
       "      <td>5.697239</td>\n",
       "      <td>1.244738</td>\n",
       "      <td>5.889125</td>\n",
       "      <td>0.390083</td>\n",
       "      <td>0.596582</td>\n",
       "      <td>-1.850350</td>\n",
       "      <td>-0.879606</td>\n",
       "      <td>-0.004017</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>False</td>\n",
       "      <td>1.963538</td>\n",
       "      <td>6.202582</td>\n",
       "      <td>1.685048</td>\n",
       "      <td>6.191994</td>\n",
       "      <td>-1.045229</td>\n",
       "      <td>-0.602710</td>\n",
       "      <td>0.011465</td>\n",
       "      <td>0.161703</td>\n",
       "      <td>0.683672</td>\n",
       "      <td>...</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 30 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "   treatment  y_factual  y_cfactual       mu0       mu1        x1        x2  \\\n",
       "0       True   5.599916    4.318780  3.268256  6.854457 -0.528603 -0.343455   \n",
       "1      False   6.875856    7.856495  6.636059  7.562718 -1.736945 -1.802002   \n",
       "2      False   2.996273    6.633952  1.570536  6.121617 -0.807451 -0.202946   \n",
       "3      False   1.366206    5.697239  1.244738  5.889125  0.390083  0.596582   \n",
       "4      False   1.963538    6.202582  1.685048  6.191994 -1.045229 -0.602710   \n",
       "\n",
       "         x3        x4        x5  ...  x16  x17  x18  x19  x20  x21  x22  x23  \\\n",
       "0  1.128554  0.161703 -0.316603  ...    1    1    1    1    0    0    0    0   \n",
       "1  0.383828  2.244320 -0.629189  ...    1    1    1    1    0    0    0    0   \n",
       "2 -0.360898 -0.879606  0.808706  ...    1    0    1    1    0    0    0    0   \n",
       "3 -1.850350 -0.879606 -0.004017  ...    1    0    1    1    0    0    0    0   \n",
       "4  0.011465  0.161703  0.683672  ...    1    1    1    1    0    0    0    0   \n",
       "\n",
       "   x24  x25  \n",
       "0    0    0  \n",
       "1    0    0  \n",
       "2    0    0  \n",
       "3    0    0  \n",
       "4    0    0  \n",
       "\n",
       "[5 rows x 30 columns]"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "data = pd.read_csv(\"https://raw.githubusercontent.com/AMLab-Amsterdam/CEVAE/master/datasets/IHDP/csv/ihdp_npci_1.csv\", header = None)\n",
    "col =  [\"treatment\", \"y_factual\", \"y_cfactual\", \"mu0\", \"mu1\" ,]\n",
    "for i in range(1,26):\n",
    "    col.append(\"x\"+str(i))\n",
    "data.columns = col\n",
    "data = data.astype({\"treatment\":'bool'}, copy=False)\n",
    "data.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lalonde Dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A data frame with 445 observations on the following 12 variables.\n",
    "\n",
    "- age:\n",
    "age in years.\n",
    "- educ:\n",
    "years of schooling.\n",
    "- black:\n",
    "indicator variable for blacks.\n",
    "- hisp:\n",
    "indicator variable for Hispanics.\n",
    "- married:\n",
    "indicator variable for martial status.\n",
    "- nodegr:\n",
    "indicator variable for high school diploma.\n",
    "- re74:\n",
    "real earnings in 1974.\n",
    "- re75:\n",
    "real earnings in 1975.\n",
    "- re78:\n",
    "real earnings in 1978.\n",
    "- u74:\n",
    "indicator variable for earnings in 1974 being zero.\n",
    "- u75:\n",
    "indicator variable for earnings in 1975 being zero.\n",
    "- treat:\n",
    "an indicator variable for treatment status.\n",
    "\n",
    "### References\n",
    "Dehejia, Rajeev and Sadek Wahba. 1999.``Causal Effects in Non-Experimental Studies: Re-Evaluating the Evaluation of Training Programs.'' Journal of the American Statistical Association 94 (448): 1053-1062.\n",
    "\n",
    "LaLonde, Robert. 1986. ``Evaluating the Econometric Evaluations of Training Programs.'' American Economic Review 76:604-620."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "scrolled": true
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "R[write to console]: Loading required package: MASS\n",
      "\n",
      "R[write to console]: ## \n",
      "##  Matching (Version 4.9-7, Build Date: 2020-02-05)\n",
      "##  See http://sekhon.berkeley.edu/matching for additional documentation.\n",
      "##  Please cite software as:\n",
      "##   Jasjeet S. Sekhon. 2011. ``Multivariate and Propensity Score Matching\n",
      "##   Software with Automated Balance Optimization: The Matching package for R.''\n",
      "##   Journal of Statistical Software, 42(7): 1-52. \n",
      "##\n",
      "\n",
      "\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "array(['Matching', 'MASS', 'tools', 'stats', 'graphics', 'grDevices',\n",
       "       'utils', 'datasets', 'methods', 'base'], dtype='<U9')"
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from rpy2.robjects import r as R\n",
    "\n",
    "from os.path import expanduser\n",
    "home = expanduser(\"~\")\n",
    "\n",
    "%reload_ext rpy2.ipython\n",
    "\n",
    "# %R install.packages(\"Matching\")\n",
    "%R library(Matching)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>age</th>\n",
       "      <th>educ</th>\n",
       "      <th>black</th>\n",
       "      <th>hisp</th>\n",
       "      <th>married</th>\n",
       "      <th>nodegr</th>\n",
       "      <th>re74</th>\n",
       "      <th>re75</th>\n",
       "      <th>re78</th>\n",
       "      <th>u74</th>\n",
       "      <th>u75</th>\n",
       "      <th>treat</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>37</td>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>9930.05</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>22</td>\n",
       "      <td>9</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>3595.89</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>30</td>\n",
       "      <td>12</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>24909.50</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>27</td>\n",
       "      <td>11</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>7506.15</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>33</td>\n",
       "      <td>8</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0.0</td>\n",
       "      <td>0.0</td>\n",
       "      <td>289.79</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>True</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   age  educ  black  hisp  married  nodegr  re74  re75      re78  u74  u75  \\\n",
       "1   37    11      1     0        1       1   0.0   0.0   9930.05    1    1   \n",
       "2   22     9      0     1        0       1   0.0   0.0   3595.89    1    1   \n",
       "3   30    12      1     0        0       0   0.0   0.0  24909.50    1    1   \n",
       "4   27    11      1     0        0       1   0.0   0.0   7506.15    1    1   \n",
       "5   33     8      1     0        0       1   0.0   0.0    289.79    1    1   \n",
       "\n",
       "   treat  \n",
       "1   True  \n",
       "2   True  \n",
       "3   True  \n",
       "4   True  \n",
       "5   True  "
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "%R data(lalonde)\n",
    "%R -o lalonde\n",
    "lalonde = lalonde.astype({'treat':'bool'}, copy=False)\n",
    "lalonde.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 1: Building the model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## IHDP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:dowhy.causal_model:Causal Graph not provided. DoWhy will construct a graph based on data inputs.\n",
      "INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named \"Unobserved Confounders\" to reflect this.\n",
      "INFO:dowhy.causal_model:Model to find the causal effect of treatment ['treatment'] on outcome ['y_factual']\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<dowhy.causal_model.CausalModel at 0x7fafe92321d0>"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Create a causal model from the data and given common causes\n",
    "\n",
    "common_causes = []\n",
    "\n",
    "for i in range(1, 26):\n",
    "    common_causes += [\"x\"+str(i)]\n",
    "\n",
    "ihdp_model = CausalModel(\n",
    "                data=data,\n",
    "                treatment='treatment',\n",
    "                outcome='y_factual',\n",
    "                common_causes=common_causes\n",
    "            )\n",
    "ihdp_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lalonde"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "WARNING:dowhy.causal_model:Causal Graph not provided. DoWhy will construct a graph based on data inputs.\n",
      "INFO:dowhy.causal_graph:If this is observed data (not from a randomized experiment), there might always be missing confounders. Adding a node named \"Unobserved Confounders\" to reflect this.\n",
      "INFO:dowhy.causal_model:Model to find the causal effect of treatment ['treat'] on outcome ['re78']\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "<dowhy.causal_model.CausalModel at 0x7fafe9170610>"
      ]
     },
     "execution_count": 15,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "lalonde_model = CausalModel(\n",
    "                data=lalonde,\n",
    "                treatment='treat',\n",
    "                outcome='re78',\n",
    "                common_causes='nodegr+black+hisp+age+educ+married'.split('+')\n",
    "            )\n",
    "lalonde_model"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 2: Identification"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## IHDP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['x25', 'x12', 'U', 'x4', 'x14', 'x5', 'x17', 'x18', 'x16', 'x9', 'x6', 'x1', 'x15', 'x19', 'x20', 'x10', 'x23', 'x21', 'x2', 'x22', 'x13', 'x11', 'x24', 'x8', 'x3', 'x7']\n",
      "WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:[]\n"
     ]
    }
   ],
   "source": [
    "#Identify the causal effect for the ihdp dataset\n",
    "ihdp_identified_estimand = ihdp_model.identify_effect()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lalonde"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_identifier:Common causes of treatment and outcome:['hisp', 'black', 'U', 'age', 'married', 'educ', 'nodegr']\n",
      "WARNING:dowhy.causal_identifier:If this is observed data (not from a randomized experiment), there might always be missing confounders. Causal effect cannot be identified perfectly.\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "WARN: Do you want to continue by ignoring any unobserved confounders? (use proceed_when_unidentifiable=True to disable this prompt) [y/n] y\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_identifier:Instrumental variables for treatment and outcome:[]\n"
     ]
    }
   ],
   "source": [
    "#Identify the causal effect for the lalonde dataset\n",
    "lalonde_identified_estimand = lalonde_model.identify_effect()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 3: Estimation (using propensity score weighting)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## IHDP"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator\n",
      "INFO:dowhy.causal_estimator:b: y_factual~treatment+x25+x12+x4+x14+x5+x17+x18+x16+x9+x6+x1+x15+x19+x20+x10+x23+x21+x2+x22+x13+x11+x24+x8+x3+x7\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Causal Estimate is 3.409737824404964\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tanmay/model_env/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    }
   ],
   "source": [
    "ihdp_estimate = ihdp_model.estimate_effect(\n",
    "                    ihdp_identified_estimand,\n",
    "                    method_name=\"backdoor.propensity_score_weighting\"\n",
    "                )\n",
    "\n",
    "print(\"The Causal Estimate is \" + str(ihdp_estimate.value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lalonde"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator\n",
      "INFO:dowhy.causal_estimator:b: re78~treat+hisp+black+age+married+educ+nodegr\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "The Causal Estimate is 1614.1203743033784\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tanmay/model_env/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    }
   ],
   "source": [
    "lalonde_estimate = lalonde_model.estimate_effect(\n",
    "                        lalonde_identified_estimand,\n",
    "                        method_name=\"backdoor.propensity_score_weighting\"\n",
    "                    )\n",
    "\n",
    "print(\"The Causal Estimate is \" + str(lalonde_estimate.value))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Step 4:  Refutation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## IHDP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add Random Common Cause"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 33,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator\n",
      "INFO:dowhy.causal_estimator:b: y_factual~treatment+x25+x12+x4+x14+x5+x17+x18+x16+x9+x6+x1+x15+x19+x20+x10+x23+x21+x2+x22+x13+x11+x24+x8+x3+x7+w_random\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Refute: Add a Random Common Cause\n",
      "Estimated effect:(3.409737824404964,)\n",
      "New effect:(3.4082516978396793,)\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tanmay/model_env/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    }
   ],
   "source": [
    "ihdp_refute_random_common_cause = ihdp_model.refute_estimate(\n",
    "                                        ihdp_identified_estimand,\n",
    "                                        ihdp_estimate,\n",
    "                                        method_name=\"random_common_cause\"\n",
    "                                    )\n",
    "\n",
    "print(ihdp_refute_random_common_cause)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Replace Treatment with Placebo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 35,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator\n",
      "INFO:dowhy.causal_estimator:b: y_factual~placebo+x25+x12+x4+x14+x5+x17+x18+x16+x9+x6+x1+x15+x19+x20+x10+x23+x21+x2+x22+x13+x11+x24+x8+x3+x7\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Refute: Use a Placebo Treatment\n",
      "Estimated effect:(3.409737824404964,)\n",
      "New effect:(-0.030843695817419192,)\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tanmay/model_env/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    }
   ],
   "source": [
    "ihdp_refute_placebo_treatment = ihdp_model.refute_estimate(\n",
    "                                    ihdp_identified_estimand,\n",
    "                                    ihdp_estimate,\n",
    "                                    method_name=\"placebo_treatment_refuter\",\n",
    "                                    placebo_type=\"permute\"\n",
    "                                )\n",
    "\n",
    "print(ihdp_refute_placebo_treatment)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Remove Random Subset of Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 36,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator\n",
      "INFO:dowhy.causal_estimator:b: y_factual~treatment+x25+x12+x4+x14+x5+x17+x18+x16+x9+x6+x1+x15+x19+x20+x10+x23+x21+x2+x22+x13+x11+x24+x8+x3+x7\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Refute: Use a subset of data\n",
      "Estimated effect:(3.409737824404964,)\n",
      "New effect:(3.490166414737983,)\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tanmay/model_env/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    }
   ],
   "source": [
    "ihdp_refute_random_subset = ihdp_model.refute_estimate(\n",
    "                                    ihdp_identified_estimand,\n",
    "                                    ihdp_estimate,\n",
    "                                    method_name=\"data_subset_refuter\",\n",
    "                                    subset_fraction=0.8\n",
    "                            )\n",
    "print(ihdp_refute_random_subset)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Lalonde"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Add Random Common Cause"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 38,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator\n",
      "INFO:dowhy.causal_estimator:b: re78~treat+hisp+black+age+married+educ+nodegr+w_random\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Refute: Add a Random Common Cause\n",
      "Estimated effect:(1614.1203743033784,)\n",
      "New effect:(1623.995451276467,)\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tanmay/model_env/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    }
   ],
   "source": [
    "lalonde_refute_random_common_cause = lalonde_model.refute_estimate(\n",
    "                                            lalonde_identified_estimand,\n",
    "                                            lalonde_estimate,\n",
    "                                            method_name=\"random_common_cause\"\n",
    "                                        )\n",
    "\n",
    "print(lalonde_refute_random_common_cause)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Replace Treatment with Placebo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 39,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator\n",
      "INFO:dowhy.causal_estimator:b: re78~placebo+hisp+black+age+married+educ+nodegr\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Refute: Use a Placebo Treatment\n",
      "Estimated effect:(1614.1203743033784,)\n",
      "New effect:(913.2564092536604,)\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tanmay/model_env/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    }
   ],
   "source": [
    "lalonde_refute_placebo_treatment = lalonde_model.refute_estimate(\n",
    "                                        lalonde_identified_estimand,\n",
    "                                        lalonde_estimate,\n",
    "                                        method_name=\"placebo_treatment_refuter\",\n",
    "                                        placebo_type=\"permute\"\n",
    "                                    )\n",
    "\n",
    "print(lalonde_refute_placebo_treatment)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Remove Random Subset of Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 40,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "INFO:dowhy.causal_estimator:INFO: Using Propensity Score Weighting Estimator\n",
      "INFO:dowhy.causal_estimator:b: re78~treat+hisp+black+age+married+educ+nodegr\n"
     ]
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Refute: Use a subset of data\n",
      "Estimated effect:(1614.1203743033784,)\n",
      "New effect:(1425.6467153243575,)\n",
      "\n"
     ]
    },
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/tanmay/model_env/lib/python3.7/site-packages/sklearn/utils/validation.py:760: DataConversionWarning: A column-vector y was passed when a 1d array was expected. Please change the shape of y to (n_samples, ), for example using ravel().\n",
      "  y = column_or_1d(y, warn=True)\n"
     ]
    }
   ],
   "source": [
    "lalonde_refute_random_subset = lalonde_model.refute_estimate(\n",
    "                                    lalonde_identified_estimand,\n",
    "                                    lalonde_estimate,\n",
    "                                    method_name=\"data_subset_refuter\",\n",
    "                                    subset_fraction=0.9\n",
    "                                )\n",
    "\n",
    "print(lalonde_refute_random_subset)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.5"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}