{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# DoWhy example on the Lalonde dataset\n",
    "\n",
    "Thanks to [@mizuy](https://github.com/mizuy) for providing this example. Here we use the Lalonde dataset and apply IPW estimator to it. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 1. Load the data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import dowhy.datasets\n",
    "\n",
    "lalonde = dowhy.datasets.lalonde_dataset()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 2. Run DoWhy analysis: model, identify, estimate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "from dowhy import CausalModel\n",
    "\n",
    "\n",
    "model=CausalModel(\n",
    "        data = lalonde,\n",
    "        treatment='treat',\n",
    "        outcome='re78',\n",
    "        common_causes='nodegr+black+hisp+age+educ+married'.split('+'))\n",
    "identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)\n",
    "estimate = model.estimate_effect(identified_estimand,\n",
    "        method_name=\"backdoor.propensity_score_weighting\",\n",
    "        target_units=\"ate\",                         \n",
    "        method_params={\"weighting_scheme\":\"ips_weight\"})\n",
    "\n",
    "print(\"Causal Estimate is \" + str(estimate.value))\n",
    "\n",
    "import statsmodels.formula.api as smf\n",
    "reg=smf.wls('re78~1+treat', data=lalonde, weights=lalonde.ips_stabilized_weight)\n",
    "res=reg.fit()\n",
    "res.summary()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 3. Interpret the estimate\n",
    "The plot below shows how the distribution of a confounder, \"married\" changes from the original data to the weighted data. In both datasets, we compare the distribution of \"married\" across treated and untreated units."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "estimate.interpret(method_name=\"confounder_distribution_interpreter\",var_type='discrete',\n",
    "                   var_name='married', fig_size = (10, 7), font_size = 12)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 4. Sanity check: compare to manual IPW estimate"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "df = model._data\n",
    "ps = df['propensity_score']\n",
    "y = df['re78']\n",
    "z = df['treat']\n",
    "\n",
    "ey1 = z*y/ps / sum(z/ps)\n",
    "ey0 = (1-z)*y/(1-ps) / sum((1-z)/(1-ps))\n",
    "ate = ey1.sum()-ey0.sum()\n",
    "print(\"Causal Estimate is \" + str(ate))\n",
    "\n",
    "# correct -> Causal Estimate is 1634.9868359746906"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.12"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": false,
   "sideBar": true,
   "skip_h1_title": true,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}