{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# DoWhy example on the Lalonde dataset\n", "\n", "Thanks to [@mizuy](https://github.com/mizuy) for providing this example. Here we use the Lalonde dataset and apply IPW estimator to it. " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os, sys\n", "sys.path.append(os.path.abspath(\"../../../\"))\n", "\n", "import dowhy\n", "from dowhy import CausalModel\n", "from rpy2.robjects import r as R\n", "%load_ext rpy2.ipython\n", "\n", "#%R install.packages(\"Matching\")\n", "%R library(Matching)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## 1. Load the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%R data(lalonde)\n", "%R -o lalonde\n", "lalonde = lalonde.astype({'treat':'bool'}, copy=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Run DoWhy analysis: model, identify, estimate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "model=CausalModel(\n", " data = lalonde,\n", " treatment='treat',\n", " outcome='re78',\n", " common_causes='nodegr+black+hisp+age+educ+married'.split('+'))\n", "identified_estimand = model.identify_effect(proceed_when_unidentifiable=True)\n", "estimate = model.estimate_effect(identified_estimand,\n", " method_name=\"backdoor.propensity_score_weighting\",\n", " target_units=\"ate\", \n", " method_params={\"weighting_scheme\":\"ips_weight\"})\n", "#print(estimate)\n", "print(\"Causal Estimate is \" + str(estimate.value))\n", "\n", "import statsmodels.formula.api as smf\n", "reg=smf.wls('re78~1+treat', data=lalonde, weights=lalonde.ips_stabilized_weight)\n", "res=reg.fit()\n", "res.summary()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Interpret the estimate\n", "The plot below shows how the distribution of a confounder, \"married\" changes from the original data to the weighted data. In both datasets, we compare the distribution of \"married\" across treated and untreated units." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "estimate.interpret(method_name=\"confounder_distribution_interpreter\",var_type='discrete',\n", " var_name='married', fig_size = (10, 7), font_size = 12)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Sanity check: compare to manual IPW estimate" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "df = model._data\n", "ps = df['propensity_score']\n", "y = df['re78']\n", "z = df['treat']\n", "\n", "ey1 = z*y/ps / sum(z/ps)\n", "ey0 = (1-z)*y/(1-ps) / sum((1-z)/(1-ps))\n", "ate = ey1.sum()-ey0.sum()\n", "print(\"Causal Estimate is \" + str(ate))\n", "\n", "# correct -> Causal Estimate is 1634.9868359746906" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.8.12" }, "toc": { "base_numbering": 1, "nav_menu": {}, "number_sections": false, "sideBar": true, "skip_h1_title": true, "title_cell": "Table of Contents", "title_sidebar": "Contents", "toc_cell": false, "toc_position": {}, "toc_section_display": true, "toc_window_display": false } }, "nbformat": 4, "nbformat_minor": 2 }