dowhy package

Subpackages

Submodules

dowhy.causal_estimator module

class dowhy.causal_estimator.CausalEstimate(data, treatment_name, outcome_name, estimate, target_estimand, realized_estimand_expr, control_value, treatment_value, conditional_estimates=None, **kwargs)[source]

Bases: object

Class for the estimate object that every causal estimator returns

add_effect_strength(strength_dict)[source]
add_estimator(estimator_instance)[source]
add_params(**kwargs)[source]
estimate_conditional_effects(effect_modifiers=None, num_quantiles=5)[source]

Estimate treatment effect conditioned on given variables.

If a numeric effect modifier is provided, it is discretized into quantile bins. If you would like a custom discretization, you can do so yourself: create a new column containing the discretized effect modifier and then include that column’s name in the effect_modifier_names argument.

Parameters:
  • effect_modifiers – Names of effect modifier variables over which the conditional effects will be estimated. If not provided, defaults to the effect modifiers specified during creation of the CausalEstimator object.

  • num_quantiles – The number of quantiles into which a numeric effect modifier variable is discretized. Does not affect any categorical effect modifiers.

Returns:

A (multi-index) dataframe that provides separate effects for each value of the (discretized) effect modifiers.

get_confidence_intervals(confidence_level=None, method=None, **kwargs)[source]

Get confidence intervals of the obtained estimate.

By default, this is done with the help of bootstrapped confidence intervals but can be overridden if the specific estimator implements other methods of estimating confidence intervals.

If the method provided is not bootstrap, this function calls the implementation of the specific estimator.

Parameters:
  • method – Method for estimating confidence intervals.

  • confidence_level – The confidence level of the confidence intervals of the estimate.

  • kwargs – Other optional args to be passed to the CI method.

Returns:

The obtained confidence interval.

get_standard_error(method=None, **kwargs)[source]

Get standard error of the obtained estimate.

By default, this is done with the help of bootstrapped standard errors but can be overridden if the specific estimator implements other methods of estimating standard error.

If the method provided is not bootstrap, this function calls the implementation of the specific estimator.

Parameters:
  • method – Method for computing the standard error.

  • kwargs – Other optional parameters to be passed to the estimating method.

Returns:

Standard error of the causal estimate.

interpret(method_name=None, **kwargs)[source]

Interpret the causal estimate.

Parameters:
  • method_name – Method used (string) or a list of methods. If None, then the default for the specific estimator is used.

  • kwargs: – Optional parameters that are directly passed to the interpreter method.

Returns:

None

test_stat_significance(method=None, **kwargs)[source]

Test statistical significance of the estimate obtained.

By default, uses resampling to create a non-parametric significance test. Individual child estimators can implement different methods. If the method name is different from “bootstrap”, this function calls the implementation of the child estimator.

Parameters:
  • method – Method for checking statistical significance

  • kwargs – Other optional parameters to be passed to the estimating method.

Returns:

p-value from the significance test

class dowhy.causal_estimator.CausalEstimator(identified_estimand: IdentifiedEstimand, test_significance: Union[bool, str] = False, evaluate_effect_strength: bool = False, confidence_intervals: bool = False, num_null_simulations: int = 1000, num_simulations: int = 399, sample_size_fraction: int = 1, confidence_level: float = 0.95, need_conditional_estimates: Union[bool, str] = 'auto', num_quantiles_to_discretize_cont_cols: int = 5, **_)[source]

Bases: object

Base class for an estimator of causal effect.

Subclasses implement different estimation methods. All estimation methods are in the package “dowhy.causal_estimators”

Initializes an estimator with data and names of relevant variables.

This method is called from the constructors of its child classes.

Parameters:
  • identified_estimand – probability expression representing the target identified estimand to estimate.

  • test_significance – Binary flag or a string indicating whether to test significance and by which method. All estimators support test_significance=”bootstrap” that estimates a p-value for the obtained estimate using the bootstrap method. Individual estimators can override this to support custom testing methods. The bootstrap method supports an optional parameter, num_null_simulations. If False, no testing is done. If True, significance of the estimate is tested using the custom method if available, otherwise by bootstrap.

  • evaluate_effect_strength – (Experimental) whether to evaluate the strength of effect

  • confidence_intervals – Binary flag or a string indicating whether the confidence intervals should be computed and which method should be used. All methods support estimation of confidence intervals using the bootstrap method by using the parameter confidence_intervals=”bootstrap”. The bootstrap method takes in two arguments (num_simulations and sample_size_fraction) that can be optionally specified in the params dictionary. Estimators may also override this to implement their own confidence interval method. If this parameter is False, no confidence intervals are computed. If True, confidence intervals are computed by the estimator’s specific method if available, otherwise through bootstrap

  • num_null_simulations – The number of simulations for testing the statistical significance of the estimator

  • num_simulations – The number of simulations for finding the confidence interval (and/or standard error) for a estimate

  • sample_size_fraction – The size of the sample for the bootstrap estimator

  • confidence_level – The confidence level of the confidence interval estimate

  • need_conditional_estimates – Boolean flag indicating whether conditional estimates should be computed. Defaults to True if there are effect modifiers in the graph

  • num_quantiles_to_discretize_cont_cols – The number of quantiles into which a numeric effect modifier is split, to enable estimation of conditional treatment effect over it.

  • kwargs – (optional) Additional estimator-specific parameters

Returns:

an instance of the estimator class.

class BootstrapEstimates(estimates, params)

Bases: tuple

Create new instance of BootstrapEstimates(estimates, params)

estimates

Alias for field number 0

params

Alias for field number 1

DEFAULT_CONFIDENCE_LEVEL = 0.95
DEFAULT_INTERPRET_METHOD = ['textual_effect_interpreter']
DEFAULT_NOTIMPLEMENTEDERROR_MSG = 'not yet implemented for {0}. If you would this to be implemented in the next version, please raise an issue at https://github.com/microsoft/dowhy/issues'
DEFAULT_NUMBER_OF_SIMULATIONS_CI = 399
DEFAULT_NUMBER_OF_SIMULATIONS_STAT_TEST = 1000
DEFAULT_SAMPLE_SIZE_FRACTION = 1
NUM_QUANTILES_TO_DISCRETIZE_CONT_COLS = 5
TEMP_CAT_COLUMN_PREFIX = '__categorical__'
construct_symbolic_estimator(estimand)[source]
do(x, data_df=None)[source]

Method that implements the do-operator.

Given a value x for the treatment, returns the expected value of the outcome when the treatment is intervened to a value x.

Parameters:
  • x – Value of the treatment

  • data_df – Data on which the do-operator is to be applied.

Returns:

Value of the outcome when treatment is intervened/set to x.

estimate_confidence_intervals(data: DataFrame, estimate_value, confidence_level=None, method=None, **kwargs)[source]

Find the confidence intervals corresponding to any estimator By default, this is done with the help of bootstrapped confidence intervals but can be overridden if the specific estimator implements other methods of estimating confidence intervals.

If the method provided is not bootstrap, this function calls the implementation of the specific estimator.

Parameters:
  • estimate_value – obtained estimate’s value

  • method – Method for estimating confidence intervals.

  • confidence_level – The confidence level of the confidence intervals of the estimate.

  • kwargs – Other optional args to be passed to the CI method.

Returns:

The obtained confidence interval.

estimate_effect_naive(data: DataFrame)[source]
Parameters:

data – Pandas dataframe to estimate effect

estimate_std_error(data: DataFrame, method=None, **kwargs)[source]

Compute standard error of an obtained causal estimate.

Parameters:
  • method – Method for computing the standard error.

  • kwargs – Other optional parameters to be passed to the estimating method.

Returns:

Standard error of the causal estimate.

evaluate_effect_strength(data: DataFrame, estimate)[source]
get_new_estimator_object(identified_estimand, test_significance=False, evaluate_effect_strength=False, confidence_intervals=None)[source]

Create a new estimator of the same type as the one passed in the estimate argument.

Creates a new object with the identified_estimand

Parameters:

identified_estimand – IdentifiedEstimand An instance of the identified estimand class that provides the information with respect to which causal pathways are employed when the treatment effects the outcome

Returns:

A new instance of the same estimator class that had generated the given estimate.

static is_bootstrap_parameter_changed(bootstrap_estimates_params, given_params)[source]

Check whether parameters of the bootstrap have changed.

This is an efficiency method that checks if fresh resampling of the bootstrap samples is required. Returns True if parameters have changed and resampling should be done again.

Parameters:
  • bootstrap_estimates_params – A dictionary of parameters for the current bootstrap samples

  • given_params – A dictionary of parameters passed by the user

Returns:

A binary flag denoting whether the parameters are different.

reset_encoders()[source]

Removes any reference to data encoders, causing them to be re-created on next fit().

It’s important that data is consistently encoded otherwise models will produce inconsistent output. In particular, categorical variables are one-hot encoded; the mapping of original data values must be identical between model training/fitting and inference time.

Encoders are reset when fit() is called again, as the data is assumed to have changed.

A separate encoder is used for each subset of variables (treatment, common causes and effect modifiers).

signif_results_tostr(signif_results)[source]
target_units_tostr()[source]
test_significance(data: DataFrame, estimate_value, method=None, **kwargs)[source]

Test statistical significance of obtained estimate.

By default, uses resampling to create a non-parametric significance test. A general procedure. Individual child estimators can implement different methods. If the method name is different from “bootstrap”, this function calls the implementation of the child estimator.

Parameters:
  • self – object instance of class Estimator

  • estimate_value – obtained estimate’s value

  • method – Method for checking statistical significance

Returns:

p-value from the significance test

update_input(treatment_value, control_value, target_units)[source]
class dowhy.causal_estimator.RealizedEstimand(identified_estimand, estimator_name)[source]

Bases: object

update_assumptions(estimator_assumptions)[source]
update_estimand_expression(estimand_expression)[source]
dowhy.causal_estimator.estimate_effect(data: DataFrame, treatment: Union[str, List[str]], outcome: Union[str, List[str]], identifier_name: str, estimator: CausalEstimator, control_value: int = 0, treatment_value: int = 1, target_units: str = 'ate', effect_modifiers: Optional[List[str]] = None, fit_estimator: bool = True, method_params: Optional[Dict] = None)[source]

Estimate the identified causal effect.

In addition, you can directly call any of the EconML estimation methods. The convention is “backdoor.econml.path-to-estimator-class”. For example, for the double machine learning estimator (“DML” class) that is located inside “dml” module of EconML, you can use the method name, “backdoor.econml.dml.DML”. CausalML estimators can also be called. See this demo notebook.

Parameters:
  • treatment – Name of the treatment

  • outcome – Name of the outcome

  • identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method

  • estimator – Instance of a CausalEstimator to use

  • control_value – Value of the treatment in the control group, for effect estimation. If treatment is multi-variate, this can be a list.

  • treatment_value – Value of the treatment in the treated group, for effect estimation. If treatment is multi-variate, this can be a list.

  • target_units – (Experimental) The units for which the treatment effect should be estimated. This can be of three types. (1) a string for common specifications of target units (namely, “ate”, “att” and “atc”), (2) a lambda function that can be used as an index for the data (pandas DataFrame), or (3) a new DataFrame that contains values of the effect_modifiers and effect will be estimated only for this new data.

  • effect_modifiers – Names of effect modifier variables can be (optionally) specified here too, since they do not affect identification. If None, the effect_modifiers from the CausalModel are used.

  • fit_estimator – Boolean flag on whether to fit the estimator. Setting it to False is useful to estimate the effect on new data using a previously fitted estimator.

Returns:

An instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information

dowhy.causal_graph module

class dowhy.causal_graph.CausalGraph(treatment_name, outcome_name, graph=None, common_cause_names=None, instrument_names=None, effect_modifier_names=None, mediator_names=None, observed_node_names=None, missing_nodes_as_confounders=False)[source]

Bases: object

Class for creating and modifying the causal graph.

Accepts a networkx DiGraph, a ProbabilisticCausalModel <dowhy.gcm.ProbabilisticCausalModel, a graph string (or a text file) in gml format (preferred) or dot format. Graphviz-like attributes can be set for edges and nodes. E.g. style=”dashed” as an edge attribute ensures that the edge is drawn with a dashed line.

If a graph string is not given, names of treatment, outcome, and confounders, instruments and effect modifiers (if any) can be provided to create the graph.

add_missing_nodes_as_common_causes(observed_node_names)[source]
add_node_attributes(observed_node_names)[source]
add_unobserved_common_cause(observed_node_names, color='gray')[source]
all_observed(node_names)[source]
build_graph(common_cause_names, instrument_names, effect_modifier_names, mediator_names)[source]

Creates nodes and edges based on variable names and their semantics.

Currently only considers the graphical representation of “direct” effect modifiers. Thus, all effect modifiers are assumed to be “direct” unless otherwise expressed using a graph. Based on the taxonomy of effect modifiers by VanderWheele and Robins: “Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology. 2007.”

check_dseparation(nodes1, nodes2, nodes3, new_graph=None, dseparation_algo='default')[source]
check_valid_backdoor_set(nodes1, nodes2, nodes3, backdoor_paths=None, new_graph=None, dseparation_algo='default')[source]

Assume that the first parameter (nodes1) is the treatment, the second is the outcome, and the third is the candidate backdoor set

check_valid_frontdoor_set(nodes1, nodes2, candidate_nodes, frontdoor_paths=None, new_graph=None, dseparation_algo='default')[source]

Check if valid the frontdoor variables for set of treatments, nodes1 to set of outcomes, nodes2.

check_valid_mediation_set(nodes1, nodes2, candidate_nodes, mediation_paths=None)[source]

Check if candidate nodes are valid mediators for set of treatments, nodes1 to set of outcomes, nodes2.

do_surgery(node_names, remove_outgoing_edges=False, remove_incoming_edges=False, target_node_names=None, remove_only_direct_edges_to_target=False)[source]

Method to create a new graph based on the concept of do-surgery.

Parameters:
  • node_names – focal nodes for the surgery

  • remove_outgoing_edges – whether to remove outgoing edges from the focal nodes

  • remove_incoming_edges – whether to remove incoming edges to the focal nodes

  • target_node_names – target nodes (optional) for the surgery, only used when remove_only_direct_edges_to_target is True

  • remove_only_direct_edges_to_target – whether to remove only the direct edges from focal nodes to the target nodes

Returns:

a new networkx graph after the specified removal of edges

filter_unobserved_variables(node_names)[source]
get_adjacency_matrix(*args, **kwargs)[source]

Get adjacency matrix from the networkx graph

get_all_directed_paths(nodes1, nodes2)[source]

Get all directed paths between sets of nodes.

Currently only supports singleton sets.

get_all_nodes(include_unobserved=True)[source]
get_ancestors(node_name, new_graph=None)[source]
get_backdoor_paths(nodes1, nodes2)[source]
get_causes(nodes, remove_edges=None)[source]
get_common_causes(nodes1, nodes2)[source]

Assume that nodes1 causes nodes2 (e.g., nodes1 are the treatments and nodes2 are the outcomes)

get_descendants(nodes)[source]
get_effect_modifiers(nodes1, nodes2)[source]
get_instruments(treatment_nodes, outcome_nodes)[source]
get_parents(node_name)[source]
get_unconfounded_observed_subgraph()[source]
has_directed_path(nodes1, nodes2)[source]

Checks if there is any directed path between two sets of nodes.

Currently only supports singleton sets.

is_blocked(path, conditioned_nodes)[source]

Uses d-separation criteria to decide if conditioned_nodes block given path.

view_graph(layout=None, size=None, file_name='causal_model')[source]

dowhy.causal_model module

Module containing the main model class for the dowhy package.

class dowhy.causal_model.CausalModel(data, treatment, outcome, graph=None, common_causes=None, instruments=None, effect_modifiers=None, estimand_type='nonparametric-ate', proceed_when_unidentifiable=False, missing_nodes_as_confounders=False, identify_vars=False, **kwargs)[source]

Bases: object

Main class for storing the causal model state.

Initialize data and create a causal graph instance.

Assigns treatment and outcome variables. Also checks and finds the common causes and instruments for treatment and outcome.

At least one of graph, common_causes or instruments must be provided. If none of these variables are provided, then learn_graph() can be used later.

Parameters:
  • data – a pandas dataframe containing treatment, outcome and other variables.

  • treatment – name of the treatment variable

  • outcome – name of the outcome variable

  • graph – path to DOT file containing a DAG or a string containing a DAG specification in DOT format

  • common_causes – names of common causes of treatment and _outcome. Only used when graph is None.

  • instruments – names of instrumental variables for the effect of treatment on outcome. Only used when graph is None.

  • effect_modifiers – names of variables that can modify the treatment effect. If not provided, then the causal graph is used to find the effect modifiers. Estimators will return multiple different estimates based on each value of effect_modifiers.

  • estimand_type – the type of estimand requested (currently only “nonparametric-ate” is supported). In the future, may support other specific parametric forms of identification.

  • proceed_when_unidentifiable – does the identification proceed by ignoring potential unobserved confounders. Binary flag.

  • missing_nodes_as_confounders – Binary flag indicating whether variables in the dataframe that are not included in the causal graph, should be automatically included as confounder nodes.

  • identify_vars – Variable deciding whether to compute common causes, instruments and effect modifiers while initializing the class. identify_vars should be set to False when user is providing common_causes, instruments or effect modifiers on their own(otherwise the identify_vars code can override the user provided values). Also it does not make sense if no graph is given.

Returns:

an instance of CausalModel class

do(x, identified_estimand, method_name=None, fit_estimator=True, method_params=None)[source]

Do operator for estimating values of the outcome after intervening on treatment.

Parameters:
  • x – interventional value of the treatment variable

  • identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method

  • method_name – any of the estimation method to be used. See docs for estimate_effect method for a list of supported estimation methods.

  • fit_estimator – Boolean flag on whether to fit the estimator. Setting it to False is useful to compute the do-operation on new data using a previously fitted estimator.

  • method_params – Dictionary containing any method-specific parameters. These are passed directly to the estimating method.

Returns:

an instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information

estimate_effect(identified_estimand, method_name=None, control_value=0, treatment_value=1, test_significance=None, evaluate_effect_strength=False, confidence_intervals=False, target_units='ate', effect_modifiers=None, fit_estimator=True, method_params=None)[source]

Estimate the identified causal effect.

Currently requires an explicit method name to be specified. Method names follow the convention of identification method followed by the specific estimation method: “[backdoor/iv/frontdoor].estimation_method_name”. For a list of supported methods, check out the User Guide. Here are some examples.

  • Propensity Score Matching: “backdoor.propensity_score_matching”

  • Propensity Score Stratification: “backdoor.propensity_score_stratification”

  • Propensity Score-based Inverse Weighting: “backdoor.propensity_score_weighting”

  • Linear Regression: “backdoor.linear_regression”

  • Generalized Linear Models (e.g., logistic regression): “backdoor.generalized_linear_model”

  • Instrumental Variables: “iv.instrumental_variable”

  • Regression Discontinuity: “iv.regression_discontinuity”

  • Two Stage Regression: “frontdoor.two_stage_regression”

In addition, you can directly call any of the EconML estimation methods. The convention is “[backdoor/iv].econml.path-to-estimator-class”. For example, for the double machine learning estimator (“DML” class) that is located inside “dml” module of EconML, you can use the method name, “backdoor.econml.dml.DML”. See this demo notebook.

Parameters:
  • identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method

  • method_name – name of the estimation method to be used.

  • control_value – Value of the treatment in the control group, for effect estimation. If treatment is multi-variate, this can be a list.

  • treatment_value – Value of the treatment in the treated group, for effect estimation. If treatment is multi-variate, this can be a list.

  • test_significance – Binary flag on whether to additionally do a statistical signficance test for the estimate.

  • evaluate_effect_strength – (Experimental) Binary flag on whether to estimate the relative strength of the treatment’s effect. This measure can be used to compare different treatments for the same outcome (by running this method with different treatments sequentially).

  • confidence_intervals – (Experimental) Binary flag indicating whether confidence intervals should be computed.

  • target_units – (Experimental) The units for which the treatment effect should be estimated. This can be of three types. (1) a string for common specifications of target units (namely, “ate”, “att” and “atc”), (2) a lambda function that can be used as an index for the data (pandas DataFrame), or (3) a new DataFrame that contains values of the effect_modifiers and effect will be estimated only for this new data.

  • effect_modifiers – Names of effect modifier variables can be (optionally) specified here too, since they do not affect identification. If None, the effect_modifiers from the CausalModel are used.

  • fit_estimator – Boolean flag on whether to fit the estimator. Setting it to False is useful to estimate the effect on new data using a previously fitted estimator.

  • method_params – Dictionary containing any method-specific parameters. These are passed directly to the estimating method. See the docs for each estimation method for allowed method-specific params.

Returns:

An instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information

get_common_causes()[source]
get_effect_modifiers()[source]
get_estimator(method_name)[source]

Retrieves an existing CausalEstimator object matching the given method_name.

CausalEstimator objects are created in estimate_effect() and stored in a cache for reuse. Different instances can be created for different methods. They may be reused multiple times on different data with estimate_effect(fit_estimator=False). This is useful for e.g. estimating effects on different samples of the same dataset.

The CausalEstimate object returned by estimate_effect() also has a reference to the CausalEstimator object used to produce it:

effect = model.estimate_effect(…) effect.estimator # returns the fitted CausalEstimator estimator object

Parameters:

method_name – name of the estimation method to be used.

Returns:

An instance of CausalEstimator for the given method, if it exists, or None.

get_instruments()[source]
identify_effect(estimand_type=None, method_name='default', proceed_when_unidentifiable=None, optimize_backdoor=False)[source]

Identify the causal effect to be estimated, using properties of the causal graph.

Parameters:
  • method_name – Method name for identification algorithm. (“id-algorithm” or “default”)

  • proceed_when_unidentifiable – Binary flag indicating whether identification should proceed in the presence of (potential) unobserved confounders.

Returns:

a probability expression (estimand) for the causal effect if identified, else NULL

init_graph(graph, identify_vars)[source]

Initialize self._graph using graph provided by the user.

interpret(method_name=None, **kwargs)[source]

Interpret the causal model.

Parameters:
  • method_name – method used for interpreting the model. If None, then default interpreter is chosen that describes the model summary and shows the associated causal graph.

  • kwargs: – Optional parameters that are directly passed to the interpreter method.

Returns:

None

learn_graph(method_name='cdt.causality.graph.LiNGAM', *args, **kwargs)[source]

Learn causal graph from the data. This function takes the method name as input and initializes the causal graph object using the learnt graph.

Parameters:
  • self – instance of the CausalModel class (or its subclass)

  • method_name – Exact method name of the object to be imported from the concerned library.

Returns:

an instance of the CausalGraph class initialized with the learned graph.

refute_estimate(estimand, estimate, method_name=None, show_progress_bar=False, **kwargs)[source]

Refute an estimated causal effect.

If method_name is provided, uses the provided method. In the future, we may support automatic selection of suitable refutation tests. Following refutation methods are supported.
  • Adding a randomly-generated confounder: “random_common_cause”

  • Adding a confounder that is associated with both treatment and outcome: “add_unobserved_common_cause”

  • Replacing the treatment with a placebo (random) variable): “placebo_treatment_refuter”

  • Removing a random subset of the data: “data_subset_refuter”

Parameters:
  • estimand – target estimand, an instance of the IdentifiedEstimand class (typically, the output of identify_effect)

  • estimate – estimate to be refuted, an instance of the CausalEstimate class (typically, the output of estimate_effect)

  • method_name – name of the refutation method

  • show_progress_bar – Boolean flag on whether to show a progress bar

  • kwargs – (optional) additional arguments that are passed directly to the refutation method. Can specify a random seed here to ensure reproducible results (‘random_seed’ parameter). For method-specific parameters, consult the documentation for the specific method. All refutation methods are in the causal_refuters subpackage.

Returns:

an instance of the RefuteResult class

refute_graph(k=1, independence_test=None, independence_constraints=None)[source]

Check if the dependencies in input graph matches with the dataset - ( X ⫫ Y ) | Z where X and Y are considered as singleton sets currently Z can have multiple variables :param k: number of covariates in set Z :param independence_test: dictionary containing methods to test conditional independece in data :param independence_constraints: list of implications to be test input by the user in the format

[(x,y,(z1,z2)), (x,y, (z3,)) ]

: returns: an instance of GraphRefuter class

summary(print_to_stdout=False)[source]

Print a text summary of the model.

Returns:

a string containining the summary

view_model(layout=None, size=(8, 6), file_name='causal_model')[source]

View the causal DAG.

Parameters:
  • layout – string specifying the layout of the graph.

  • size – tuple (x, y) specifying the width and height of the figure in inches.

  • file_name – string specifying the file name for the saved causal graph png.

Returns:

a visualization of the graph

dowhy.causal_refuter module

class dowhy.causal_refuter.CausalRefutation(estimated_effect, new_effect, refutation_type)[source]

Bases: object

Class for storing the result of a refutation method.

add_refuter(refuter_instance)[source]
add_significance_test_results(refutation_result)[source]
interpret(method_name=None, **kwargs)[source]

Interpret the refutation results.

Parameters:

method_name – Method used (string) or a list of methods. If None, then the default for the specific refuter is used.

Returns:

None

class dowhy.causal_refuter.CausalRefuter(data, identified_estimand, estimate, **kwargs)[source]

Bases: object

Base class for different refutation methods.

Subclasses implement specific refutations methods.

# todo: add docstring for common parameters here and remove from child refuter classes

This class is for backwards compatibility with CausalModel Will be deprecated in the future in favor of function call refute_method_name() functions

DEFAULT_NUM_SIMULATIONS = 100
PROGRESS_BAR_COLOR = 'green'
choose_variables(required_variables)[source]
perform_bootstrap_test(estimate, simulations)[source]
perform_normal_distribution_test(estimate, simulations)[source]
refute_estimate(show_progress_bar=False)[source]
test_significance(estimate, simulations, test_type='auto', significance_level=0.05)[source]
class dowhy.causal_refuter.SignificanceTestType(value)[source]

Bases: Enum

An enumeration.

AUTO = 'auto'
BOOTSTRAP = 'bootstrap'
NORMAL = 'normal_test'
dowhy.causal_refuter.choose_variables(required_variables: Union[bool, int, list], variables_of_interest: List)[source]

This method provides a way to choose the confounders whose values we wish to modify for finding its effect on the ability of the treatment to affect the outcome.

dowhy.causal_refuter.perform_bootstrap_test(estimate, simulations: List)[source]
dowhy.causal_refuter.perform_normal_distribution_test(estimate, simulations: List)[source]
dowhy.causal_refuter.test_significance(estimate, simulations: List, test_type: SignificanceTestType = SignificanceTestType.AUTO, significance_level: float = 0.05)[source]

Tests the statistical significance of the estimate obtained to the simulations produced by a refuter.

The basis behind using the sample statistics of the refuter when we are in fact testing the estimate, is due to the fact that, we would ideally expect them to follow the same distribition.

For refutation tests (e.g., placebo refuters), consider the null distribution as a distribution of effect estimates over multiple simulations with placebo treatment, and compute how likely the true estimate (e.g., zero for placebo test) is under the null. If the probability of true effect estimate is lower than the p-value, then estimator method fails the test.

For sensitivity analysis tests (e.g., bootstrap, subset or common cause refuters), the null distribution captures the distribution of effect estimates under the “true” dataset (e.g., with an additional confounder or different sampling), and we compute the probability of the obtained estimate under this distribution. If the probability is lower than the p-value, then the estimator method fails the test.

Null Hypothesis- The estimate is a part of the distribution Alternative Hypothesis- The estimate does not fall in the distribution.

Parameters:
  • 'estimate' – CausalEstimate The estimate obtained from the estimator for the original data.

  • 'simulations' – np.array An array containing the result of the refuter for the simulations

  • 'test_type' – string, default ‘auto’ The type of test the user wishes to perform.

  • 'significance_level' – float, default 0.05 The significance level for the statistical test

Returns:

significance_dict: Dict A Dict containing the p_value and a boolean that indicates if the result is statistically significant

dowhy.data_transformer module

class dowhy.data_transformer.DimensionalityReducer(data_array, ndims, **kwargs)[source]

Bases: object

reduce(target_dimensions=None)[source]

dowhy.datasets module

Module for generating some sample datasets.

dowhy.datasets.choice(a, size=None, replace=True, p=None)

Generates a random sample from a given 1-D array

New in version 1.7.0.

Note

New code should use the ~numpy.random.Generator.choice method of a ~numpy.random.Generator instance instead; please see the random-quick-start.

Parameters

a1-D array-like or int

If an ndarray, a random sample is generated from its elements. If an int, the random sample is generated as if it were np.arange(a)

sizeint or tuple of ints, optional

Output shape. If the given shape is, e.g., (m, n, k), then m * n * k samples are drawn. Default is None, in which case a single value is returned.

replaceboolean, optional

Whether the sample is with or without replacement. Default is True, meaning that a value of a can be selected multiple times.

p1-D array-like, optional

The probabilities associated with each entry in a. If not given, the sample assumes a uniform distribution over all entries in a.

Returns

samplessingle item or ndarray

The generated random samples

Raises

ValueError

If a is an int and less than zero, if a or p are not 1-dimensional, if a is an array-like of size 0, if p is not a vector of probabilities, if a and p have different lengths, or if replace=False and the sample size is greater than the population size

See Also

randint, shuffle, permutation random.Generator.choice: which should be used in new code

Notes

Setting user-specified probabilities through p uses a more general but less efficient sampler than the default. The general sampler produces a different sample than the optimized sampler even if each element of p is 1 / len(a).

Sampling random rows from a 2-D array is not possible with this function, but is possible with Generator.choice through its axis keyword.

Examples

Generate a uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3)
array([0, 3, 4]) # random
>>> #This is equivalent to np.random.randint(0,5,3)

Generate a non-uniform random sample from np.arange(5) of size 3:

>>> np.random.choice(5, 3, p=[0.1, 0, 0.3, 0.6, 0])
array([3, 3, 0]) # random

Generate a uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False)
array([3,1,0]) # random
>>> #This is equivalent to np.random.permutation(np.arange(5))[:3]

Generate a non-uniform random sample from np.arange(5) of size 3 without replacement:

>>> np.random.choice(5, 3, replace=False, p=[0.1, 0, 0.3, 0.6, 0])
array([2, 3, 0]) # random

Any of the above can be repeated with an arbitrary array-like instead of just integers. For instance:

>>> aa_milne_arr = ['pooh', 'rabbit', 'piglet', 'Christopher']
>>> np.random.choice(aa_milne_arr, 5, p=[0.5, 0.1, 0.1, 0.3])
array(['pooh', 'pooh', 'pooh', 'Christopher', 'piglet'], # random
      dtype='<U11')
dowhy.datasets.construct_col_names(name, num_vars, num_discrete_vars, num_discrete_levels, one_hot_encode)[source]
dowhy.datasets.convert_continuous_to_discrete(arr)[source]
dowhy.datasets.convert_to_binary(x, stochastic=True)[source]
dowhy.datasets.convert_to_categorical(arr, num_vars, num_discrete_vars, quantiles=[0.25, 0.5, 0.75], one_hot_encode=False)[source]
dowhy.datasets.create_discrete_column(num_samples, std_dev=1)[source]
dowhy.datasets.create_dot_graph(treatments, outcome, common_causes, instruments, effect_modifiers=[], frontdoor_variables=[])[source]
dowhy.datasets.create_gml_graph(treatments, outcome, common_causes, instruments, effect_modifiers=[], frontdoor_variables=[])[source]
dowhy.datasets.dataset_from_random_graph(num_vars, num_samples=1000, prob_edge=0.3, random_seed=None, prob_type_of_data=(0.333, 0.333, 0.334))[source]

This function generates a dataset with discrete and continuous kinds of variables. It creates a random graph and models the variables linearly according to the relations in the graph.

Parameters:
  • num_vars – Number of variables in the dataset

  • num_samples – Number of samples in the dataset

:param prob_edge : Probability of an edge between two random nodes in a graph :param random_seed: Seed for generating random graph :param prob_type_of_data : 3-element tuple containing the probability of data being discrete, binary and continuous respectively. :returns ret_dict : dictionary with information like dataframe, outcome, treatment, graph string and continuous, discrete and binary columns

dowhy.datasets.generate_random_graph(n, max_iter=10)[source]

Function to generate random Directed Acyclic Graph :param n: number of nodes in the graph :param max_iter: number of iterations to create graph

Returns:

Directed Acyclic Graph

See: https://datascience.oneoffcoder.com/generate-random-bbn.html

dowhy.datasets.lalonde_dataset() DataFrame[source]

Downloads and returns the Lalonde dataset from https://users.nber.org/~rdehejia/nswdata2.html

dowhy.datasets.linear_dataset(beta, num_common_causes, num_samples, num_instruments=0, num_effect_modifiers=0, num_treatments=None, num_frontdoor_variables=0, treatment_is_binary=True, treatment_is_category=False, outcome_is_binary=False, stochastic_discretization=True, num_discrete_common_causes=0, num_discrete_instruments=0, num_discrete_effect_modifiers=0, stddev_treatment_noise=1, stddev_outcome_noise=0.01, one_hot_encode=False)[source]

Generate a synthetic dataset with a known effect size.

This function generates a pandas dataFrame with num_samples records. The variables follow a naming convention where the first letter indicates its role in the causality graph and then a sequence number.

Parameters:
  • beta (int or list/ndarray of length num_treatments of type int) – coefficient of the treatment(s) (‘v?’) in the generating equation of the outcome (‘y’).

  • num_common_causes (int) – Number of variables affecting both the treatment and the outcome [w -> v; w -> y]

  • num_samples (int) – Number of records to generate

  • num_instruments (int) – Number of instrumental variables [z -> v], defaults to 0

  • num_effect_modifiers (int) – Number of effect modifiers, variables affecting only the outcome [x -> y], defaults to 0

  • num_treatments – Number of treatment variables [v]. By default inferred from the beta argument. When provided, beta is recycled to match num_treatments.

:type num_treatments : Union[None, int] :param num_frontdoor_variables : Number of frontdoor mediating variables [v -> FD -> y], defaults to 0 :type num_frontdoor_variables: int :param treatment_is_binary: Cannot be True if treatment_is_category is True, defaults to True :type treatment_is_binary: bool :param treatment_is_category: Cannot be True if treatment_is_binary is True, defaults to False :type treatment_is_category: bool :param outcome_is_binary: defaults to False, :type outcome_is_binary: bool :param stochastic_discretization: if False, quartiles are used when discretised variables are specified. They can be hot encoded, defaults True :type stochastic_discretization: bool :param num_discrete_common_causes: Number of discrete common causes of the total num_common_causes, defaults to 0 :type num_discrete_common_causes: int :param num_discrete_instruments: Number of discrete instrumental variables of the total num_instruments, defaults to 0 :type num_discrete_instruments : int :param num_discrete_effect_modifiers : Number of discrete effect modifiers of the total effect_modifiers, defaults to 0 :type num_discrete_effect_modifiers: int :param stddev_treatment_noise : defaults to 1 :type stddev_treatment_noise : float :param stddev_outcome_noise: defaults to 0.01 :type stddev_outcome_noise: float :param one_hot_encode: defaults to False :type one_hot_encode: bool

Returns:

Dictionary with pandas dataFrame and few other metadata variables.

”df”: pd.dataFrame with num_samples records. The variables follow a naming convention were the first letter indicates its role in the causality graph and then a sequence number.

v variables - are the treatments. They can be binary or continuous. In the case of continuous abs(beta) defines thier magnitude;

y - is the outcome variable. The generating equation is,

y = normal(0, stddev_outcome_noise) + t @ beta [where @ is a numpy matrix multiplication allowing for beta be a vector]

W variables - commonly cause both the treatment and the outcome and are iid. if continuous, they are Norm(mu = Unif(-1,1), sigma = 1)

Z variables - Instrument variables. Each one affects all treatments. i.e. if there is one instrument and two treatments then z0->v0, z0->v1

X variables - effect modifiers. If continuous, they are Norm(mu = Unif(-1,1), sigma = 1)

FD variables - Front door variables, v0->FD0->y

”treatment_name”: str/list(str) “outcome_name”: str “common_causes_names”: str/list(str) “instrument_names”: str/list(str) “effect_modifier_names”: str/list(str) “frontdoor_variables_names”: str/list(str) “dot_graph”: dot_graph, “gml_graph”: gml_graph, “ate”: float, the true ate in the dataset

Return type:

dict

Examples

dowhy.datasets.partially_linear_dataset(beta, num_common_causes, num_unobserved_common_causes=0, strength_unobserved_confounding=1, num_samples=500, num_treatments=None, treatment_is_binary=True, treatment_is_category=False, outcome_is_binary=False, stochastic_discretization=True, num_discrete_common_causes=0, stddev_treatment_noise=1, stddev_outcome_noise=0, one_hot_encode=False, training_sample_size=10, random_state=0)[source]
dowhy.datasets.psid_dataset() DataFrame[source]

Downloads and returns the PSID dataset from https://users.nber.org/~rdehejia/nswdata2.html

This is a non-experimental comparison group constructed by Lalonde, consisting entirely of control observations.

dowhy.datasets.sales_dataset(start_date: str = '2021-01-01', end_date: str = '2021-12-31', frequency: str = 'd', num_shopping_events: int = 15, original_product_price: int = 1000, product_production_cost: int = 500, based_ad_spending: int = 1000, change_of_price: float = 1.0, change_of_demand: float = 1.25, page_visitor_factor: float = 1.0) DataFrame[source]

Create a sales dataset based on a single product item with daily data.

This closely follows the blog post: https://aws.amazon.com/blogs/opensource/root-cause-analysis-with-dowhy-an-open-source-python-library-for-causal-machine-learning/

Parameters:
  • start_date – The starting date for the dataset, formatted as “YYYY-MM-DD”. Default is “2021-01-01”.

  • end_date – The ending date for the dataset, formatted as “YYYY-MM-DD”. Default is “2021-12-31”.

  • frequency – Frequency for the date range. Default is “d” (daily).

  • num_shopping_events – Number of special shopping events. Default is 15.

  • original_product_price – The initial price of the product. Default is 1000.

  • product_production_cost – Cost of producing one unit of the product. Default is 500.

  • based_ad_spending – Base spending on ad campaigns. Default is 1000.

  • change_of_price – Factor by which the price changes. For example, a value of 0.9 means a 10% decrease. Default is 1.0.

  • change_of_demand – Factor by which the demand changes with a change in price. See https://en.wikipedia.org/wiki/Price_elasticity_of_demand for more information. This influences the number of sold units. Default is 1.25.

  • page_visitor_factor – A factor to adjust the number of page visits. Default is 1.0.

Returns:

A dataframe containing columns related to sales data. The columns of the dataset are: - Shopping Event?: A binary value indicating whether a special shopping event took place. - Ad Spend: Spending on ad campaigns. - Page Views: Number of visits on the product detail page. - Unit Price: Price of the device, which could vary due to temporary discounts. - Sold Units: Number of sold units. - Revenue: Daily revenue. - Operational Cost: Daily operational expenses. - Profit: Daily profit.

dowhy.datasets.sigmoid(x)[source]
dowhy.datasets.simple_iv_dataset(beta, num_samples, num_treatments=None, treatment_is_binary=True, outcome_is_binary=False)[source]

Simple instrumental variable dataset with a single IV and a single confounder.

dowhy.datasets.stochastically_convert_to_three_level_categorical(x)[source]
dowhy.datasets.xy_dataset(num_samples, effect=True, num_common_causes=1, is_linear=True, sd_error=1)[source]

dowhy.do_sampler module

class dowhy.do_sampler.DoSampler(graph: DiGraph, action_nodes: List[str], outcome_nodes: List[str], observed_nodes: List[str], data, params=None, variable_types=None, num_cores=1, keep_original_treatment=False, estimand_type=EstimandType.NONPARAMETRIC_ATE)[source]

Bases: object

Base class for a sampler from the interventional distribution.

Initializes a do sampler with data and names of relevant variables.

Do sampling implements the do() operation from Pearl (2000). This is an operation is defined on a causal bayesian network, an explicit implementation of which is the basis for the MCMC sampling method.

We abstract the idea behind the three-step process to allow other methods, as well. The disrupt_causes method is the means to make treatment assignment ignorable. In the Pearlian framework, this is where we cut the edges pointing into the causal state. With other methods, this will typically be by using some approach which assumes conditional ignorability (e.g. weighting, or explicit conditioning with Robins G-formula.)

Next, the make_treatment_effective method reflects the assumption that the intervention we impose is “effective”. Most simply, we fix the causal state to some specific value. We skip this step there is no value specified for the causal state, and the original values are used instead.

Finally, we sample from the resulting distribution. This can be either from a point_sample method, in the case that the inference method doesn’t support batch sampling, or the sample method in the case that it does. For convenience, the point_sample method parallelizes with multiprocessing using the num_cores kwargs to set the number of cores to use for parallelization.

While different methods will have their own class attributes, the _df method should be common to all methods. This is them temporary dataset which starts as a copy of the original data, and is modified to reflect the steps of the do operation. Read through the existing methods (weighting is likely the most minimal) to get an idea of how this works to implement one yourself.

Parameters:
  • data – pandas.DataFrame containing the data

  • identified_estimand – dowhy.causal_identifier.IdentifiedEstimand: and estimand using a backdoor method

for effect identification. :param treatments: list or str: names of the treatment variables :param outcomes: list or str: names of the outcome variables :param variable_types: dict: A dictionary containing the variable’s names and types. ‘c’ for continuous, ‘o’ for ordered, ‘d’ for discrete, and ‘u’ for unordered discrete. :param keep_original_treatment: bool: Whether to use make_treatment_effective, or to keep the original treatment assignments. :param params: (optional) additional method parameters

disrupt_causes()[source]

Override this method to render treatment assignment conditionally ignorable :return:

do_sample(x)[source]
make_treatment_effective(x)[source]

This is more likely the implementation you’d like to use, but some methods may require overriding this method to make the treatment effective. :param x: :return:

point_sample()[source]
reset()[source]

If your DoSampler has more attributes that the _df attribute, you should reset them all to their initialization values by overriding this method. :return:

sample()[source]

By default, this expects a sampler to be built on class initialization which contains a sample method. Override this method if you want to use a different approach to sampling. :return:

dowhy.graph module

This module defines the fundamental interfaces and functions related to causal graphs.

class dowhy.graph.DirectedGraph(*args, **kwargs)[source]

Bases: HasNodes, HasEdges, Protocol

A protocol representing a directed graph as needed by graphical causal models.

This protocol specifically defines a subset of the networkx.DiGraph class, which make that class automatically compatible with DirectedGraph. While in most cases a networkx.DiGraph is the class of choice when constructing a causal graph, anyone can choose to provide their own implementation of the DirectGraph interface.

abstract predecessors(node)[source]
class dowhy.graph.HasEdges(*args, **kwargs)[source]

Bases: Protocol

This protocol defines a trait for classes having edges.

abstract property edges

:returns a Dict[Tuple[Any, Any], Dict[Any, Any]]

class dowhy.graph.HasNodes(*args, **kwargs)[source]

Bases: Protocol

This protocol defines a trait for classes having nodes.

abstract property nodes

:returns Dict[Any, Dict[Any, Any]]

dowhy.graph.build_graph(action_nodes: List[str], outcome_nodes: List[str], common_cause_nodes: Optional[List[str]] = None, instrument_nodes=None, effect_modifier_nodes=None, mediator_nodes=None)[source]

Creates nodes and edges based on variable names and their semantics.

Currently only considers the graphical representation of “direct” effect modifiers. Thus, all effect modifiers are assumed to be “direct” unless otherwise expressed using a graph. Based on the taxonomy of effect modifiers by VanderWheele and Robins: “Four types of effect modification: A classification based on directed acyclic graphs. Epidemiology. 2007.”

dowhy.graph.build_graph_from_str(graph_str: str) DiGraph[source]

User-friendly function that returns a networkx graph based on the graph string.

Formats supported: dot, gml, daggity

The graph_str parameter can refer to the path of a text file containing the encoded graph or contain the actual encoded graph as a string.

Parameters:

graph_str (str) – a string containing the filepath or the encoded graph

Returns:

a networkx directed graph object

dowhy.graph.check_dseparation(graph: DiGraph, nodes1, nodes2, nodes3, new_graph=None, dseparation_algo='default')[source]
dowhy.graph.check_valid_backdoor_set(graph: DiGraph, nodes1, nodes2, nodes3, backdoor_paths=None, new_graph: Optional[DiGraph] = None, dseparation_algo='default')[source]

Assume that the first parameter (nodes1) is the treatment, the second is the outcome, and the third is the candidate backdoor set

dowhy.graph.check_valid_frontdoor_set(graph: DiGraph, nodes1, nodes2, candidate_nodes, frontdoor_paths=None, new_graph: Optional[DiGraph] = None, dseparation_algo='default')[source]

Check if valid the frontdoor variables for set of treatments, nodes1 to set of outcomes, nodes2.

dowhy.graph.check_valid_mediation_set(graph: DiGraph, nodes1, nodes2, candidate_nodes, mediation_paths=None)[source]

Check if candidate nodes are valid mediators for set of treatments, nodes1 to set of outcomes, nodes2.

dowhy.graph.do_surgery(graph: DiGraph, node_names, remove_outgoing_edges=False, remove_incoming_edges=False, target_node_names=None, remove_only_direct_edges_to_target=False)[source]

Method to create a new graph based on the concept of do-surgery.

Parameters:
  • node_names – focal nodes for the surgery

  • remove_outgoing_edges – whether to remove outgoing edges from the focal nodes

  • remove_incoming_edges – whether to remove incoming edges to the focal nodes

  • target_node_names – target nodes (optional) for the surgery, only used when remove_only_direct_edges_to_target is True

  • remove_only_direct_edges_to_target – whether to remove only the direct edges from focal nodes to the target nodes

Returns:

a new networkx graph after the specified removal of edges

dowhy.graph.get_adjacency_matrix(graph: DiGraph, *args, **kwargs)[source]

Get adjacency matrix from the networkx graph

dowhy.graph.get_all_directed_paths(graph: DiGraph, nodes1, nodes2)[source]

Get all directed paths between sets of nodes.

Currently only supports singleton sets.

dowhy.graph.get_all_nodes(graph: DiGraph, observed_nodes: List[Any], include_unobserved_nodes: bool) List[Any][source]
dowhy.graph.get_backdoor_paths(graph: DiGraph, nodes1, nodes2)[source]
dowhy.graph.get_descendants(graph: DiGraph, nodes)[source]
dowhy.graph.get_instruments(graph: DiGraph, treatment_nodes, outcome_nodes)[source]
dowhy.graph.get_ordered_predecessors(causal_graph: DirectedGraph, node: Any) List[Any][source]

This function returns predecessors of a node in a well-defined order.

This is necessary, because we select subsets of columns in Dataframes by using a node’s parents, and these parents might not be returned in a reliable order.

dowhy.graph.has_directed_path(graph: DiGraph, nodes1, nodes2)[source]

Checks if there is any directed path between two sets of nodes.

Currently only supports singleton sets.

dowhy.graph.is_blocked(graph: DiGraph, path, conditioned_nodes)[source]

Uses d-separation criteria to decide if conditioned_nodes block given path.

dowhy.graph.is_root_node(causal_graph: DirectedGraph, node: Any) bool[source]
dowhy.graph.node_connected_subgraph_view(g: DirectedGraph, node: Any) Any[source]

Returns a view of the provided graph g that contains only nodes connected to the node passed in

dowhy.graph.validate_acyclic(causal_graph: DirectedGraph) None[source]
dowhy.graph.validate_node_in_graph(causal_graph: HasNodes, node: Any) None[source]

dowhy.graph_learner module

class dowhy.graph_learner.GraphLearner(data, library_class, *args, **kwargs)[source]

Bases: object

Base class for causal discovery methods.

Subclasses implement different discovery methods. All discovery methods are in the package “dowhy.causal_discoverers”

learn_graph()[source]

Discover causal graph and the graph in DOT format.

dowhy.interpreter module

class dowhy.interpreter.Interpreter(instance, **kwargs)[source]

Bases: object

Base class for all interpretation methods.

Initialize an interpreter.

Parameters:

instance – An object of type CausalModel, CausalEstimate or CausalRefutation.

SUPPORTED_ESTIMATORS = []
SUPPORTED_MODELS = []
SUPPORTED_REFUTERS = []
interpret()[source]

Method that implements the functionality of an interpreter.

To be overridden by interpreter sub-classes.

dowhy.plotter module

dowhy.plotter.plot_causal_effect(estimate, treatment, outcome)[source]
dowhy.plotter.plot_treatment_outcome(treatment, outcome, time_var)[source]

Module contents

class dowhy.CausalModel(data, treatment, outcome, graph=None, common_causes=None, instruments=None, effect_modifiers=None, estimand_type='nonparametric-ate', proceed_when_unidentifiable=False, missing_nodes_as_confounders=False, identify_vars=False, **kwargs)[source]

Bases: object

Main class for storing the causal model state.

Initialize data and create a causal graph instance.

Assigns treatment and outcome variables. Also checks and finds the common causes and instruments for treatment and outcome.

At least one of graph, common_causes or instruments must be provided. If none of these variables are provided, then learn_graph() can be used later.

Parameters:
  • data – a pandas dataframe containing treatment, outcome and other variables.

  • treatment – name of the treatment variable

  • outcome – name of the outcome variable

  • graph – path to DOT file containing a DAG or a string containing a DAG specification in DOT format

  • common_causes – names of common causes of treatment and _outcome. Only used when graph is None.

  • instruments – names of instrumental variables for the effect of treatment on outcome. Only used when graph is None.

  • effect_modifiers – names of variables that can modify the treatment effect. If not provided, then the causal graph is used to find the effect modifiers. Estimators will return multiple different estimates based on each value of effect_modifiers.

  • estimand_type – the type of estimand requested (currently only “nonparametric-ate” is supported). In the future, may support other specific parametric forms of identification.

  • proceed_when_unidentifiable – does the identification proceed by ignoring potential unobserved confounders. Binary flag.

  • missing_nodes_as_confounders – Binary flag indicating whether variables in the dataframe that are not included in the causal graph, should be automatically included as confounder nodes.

  • identify_vars – Variable deciding whether to compute common causes, instruments and effect modifiers while initializing the class. identify_vars should be set to False when user is providing common_causes, instruments or effect modifiers on their own(otherwise the identify_vars code can override the user provided values). Also it does not make sense if no graph is given.

Returns:

an instance of CausalModel class

do(x, identified_estimand, method_name=None, fit_estimator=True, method_params=None)[source]

Do operator for estimating values of the outcome after intervening on treatment.

Parameters:
  • x – interventional value of the treatment variable

  • identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method

  • method_name – any of the estimation method to be used. See docs for estimate_effect method for a list of supported estimation methods.

  • fit_estimator – Boolean flag on whether to fit the estimator. Setting it to False is useful to compute the do-operation on new data using a previously fitted estimator.

  • method_params – Dictionary containing any method-specific parameters. These are passed directly to the estimating method.

Returns:

an instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information

estimate_effect(identified_estimand, method_name=None, control_value=0, treatment_value=1, test_significance=None, evaluate_effect_strength=False, confidence_intervals=False, target_units='ate', effect_modifiers=None, fit_estimator=True, method_params=None)[source]

Estimate the identified causal effect.

Currently requires an explicit method name to be specified. Method names follow the convention of identification method followed by the specific estimation method: “[backdoor/iv/frontdoor].estimation_method_name”. For a list of supported methods, check out the User Guide. Here are some examples.

  • Propensity Score Matching: “backdoor.propensity_score_matching”

  • Propensity Score Stratification: “backdoor.propensity_score_stratification”

  • Propensity Score-based Inverse Weighting: “backdoor.propensity_score_weighting”

  • Linear Regression: “backdoor.linear_regression”

  • Generalized Linear Models (e.g., logistic regression): “backdoor.generalized_linear_model”

  • Instrumental Variables: “iv.instrumental_variable”

  • Regression Discontinuity: “iv.regression_discontinuity”

  • Two Stage Regression: “frontdoor.two_stage_regression”

In addition, you can directly call any of the EconML estimation methods. The convention is “[backdoor/iv].econml.path-to-estimator-class”. For example, for the double machine learning estimator (“DML” class) that is located inside “dml” module of EconML, you can use the method name, “backdoor.econml.dml.DML”. See this demo notebook.

Parameters:
  • identified_estimand – a probability expression that represents the effect to be estimated. Output of CausalModel.identify_effect method

  • method_name – name of the estimation method to be used.

  • control_value – Value of the treatment in the control group, for effect estimation. If treatment is multi-variate, this can be a list.

  • treatment_value – Value of the treatment in the treated group, for effect estimation. If treatment is multi-variate, this can be a list.

  • test_significance – Binary flag on whether to additionally do a statistical signficance test for the estimate.

  • evaluate_effect_strength – (Experimental) Binary flag on whether to estimate the relative strength of the treatment’s effect. This measure can be used to compare different treatments for the same outcome (by running this method with different treatments sequentially).

  • confidence_intervals – (Experimental) Binary flag indicating whether confidence intervals should be computed.

  • target_units – (Experimental) The units for which the treatment effect should be estimated. This can be of three types. (1) a string for common specifications of target units (namely, “ate”, “att” and “atc”), (2) a lambda function that can be used as an index for the data (pandas DataFrame), or (3) a new DataFrame that contains values of the effect_modifiers and effect will be estimated only for this new data.

  • effect_modifiers – Names of effect modifier variables can be (optionally) specified here too, since they do not affect identification. If None, the effect_modifiers from the CausalModel are used.

  • fit_estimator – Boolean flag on whether to fit the estimator. Setting it to False is useful to estimate the effect on new data using a previously fitted estimator.

  • method_params – Dictionary containing any method-specific parameters. These are passed directly to the estimating method. See the docs for each estimation method for allowed method-specific params.

Returns:

An instance of the CausalEstimate class, containing the causal effect estimate and other method-dependent information

get_common_causes()[source]
get_effect_modifiers()[source]
get_estimator(method_name)[source]

Retrieves an existing CausalEstimator object matching the given method_name.

CausalEstimator objects are created in estimate_effect() and stored in a cache for reuse. Different instances can be created for different methods. They may be reused multiple times on different data with estimate_effect(fit_estimator=False). This is useful for e.g. estimating effects on different samples of the same dataset.

The CausalEstimate object returned by estimate_effect() also has a reference to the CausalEstimator object used to produce it:

effect = model.estimate_effect(…) effect.estimator # returns the fitted CausalEstimator estimator object

Parameters:

method_name – name of the estimation method to be used.

Returns:

An instance of CausalEstimator for the given method, if it exists, or None.

get_instruments()[source]
identify_effect(estimand_type=None, method_name='default', proceed_when_unidentifiable=None, optimize_backdoor=False)[source]

Identify the causal effect to be estimated, using properties of the causal graph.

Parameters:
  • method_name – Method name for identification algorithm. (“id-algorithm” or “default”)

  • proceed_when_unidentifiable – Binary flag indicating whether identification should proceed in the presence of (potential) unobserved confounders.

Returns:

a probability expression (estimand) for the causal effect if identified, else NULL

init_graph(graph, identify_vars)[source]

Initialize self._graph using graph provided by the user.

interpret(method_name=None, **kwargs)[source]

Interpret the causal model.

Parameters:
  • method_name – method used for interpreting the model. If None, then default interpreter is chosen that describes the model summary and shows the associated causal graph.

  • kwargs: – Optional parameters that are directly passed to the interpreter method.

Returns:

None

learn_graph(method_name='cdt.causality.graph.LiNGAM', *args, **kwargs)[source]

Learn causal graph from the data. This function takes the method name as input and initializes the causal graph object using the learnt graph.

Parameters:
  • self – instance of the CausalModel class (or its subclass)

  • method_name – Exact method name of the object to be imported from the concerned library.

Returns:

an instance of the CausalGraph class initialized with the learned graph.

refute_estimate(estimand, estimate, method_name=None, show_progress_bar=False, **kwargs)[source]

Refute an estimated causal effect.

If method_name is provided, uses the provided method. In the future, we may support automatic selection of suitable refutation tests. Following refutation methods are supported.
  • Adding a randomly-generated confounder: “random_common_cause”

  • Adding a confounder that is associated with both treatment and outcome: “add_unobserved_common_cause”

  • Replacing the treatment with a placebo (random) variable): “placebo_treatment_refuter”

  • Removing a random subset of the data: “data_subset_refuter”

Parameters:
  • estimand – target estimand, an instance of the IdentifiedEstimand class (typically, the output of identify_effect)

  • estimate – estimate to be refuted, an instance of the CausalEstimate class (typically, the output of estimate_effect)

  • method_name – name of the refutation method

  • show_progress_bar – Boolean flag on whether to show a progress bar

  • kwargs – (optional) additional arguments that are passed directly to the refutation method. Can specify a random seed here to ensure reproducible results (‘random_seed’ parameter). For method-specific parameters, consult the documentation for the specific method. All refutation methods are in the causal_refuters subpackage.

Returns:

an instance of the RefuteResult class

refute_graph(k=1, independence_test=None, independence_constraints=None)[source]

Check if the dependencies in input graph matches with the dataset - ( X ⫫ Y ) | Z where X and Y are considered as singleton sets currently Z can have multiple variables :param k: number of covariates in set Z :param independence_test: dictionary containing methods to test conditional independece in data :param independence_constraints: list of implications to be test input by the user in the format

[(x,y,(z1,z2)), (x,y, (z3,)) ]

: returns: an instance of GraphRefuter class

summary(print_to_stdout=False)[source]

Print a text summary of the model.

Returns:

a string containining the summary

view_model(layout=None, size=(8, 6), file_name='causal_model')[source]

View the causal DAG.

Parameters:
  • layout – string specifying the layout of the graph.

  • size – tuple (x, y) specifying the width and height of the figure in inches.

  • file_name – string specifying the file name for the saved causal graph png.

Returns:

a visualization of the graph

class dowhy.EstimandType(value)[source]

Bases: Enum

An enumeration.

NONPARAMETRIC_ATE = 'nonparametric-ate'
NONPARAMETRIC_CDE = 'nonparametric-cde'
NONPARAMETRIC_NDE = 'nonparametric-nde'
NONPARAMETRIC_NIE = 'nonparametric-nie'
dowhy.identify_effect(graph: DiGraph, action_nodes: Union[str, List[str]], outcome_nodes: Union[str, List[str]], observed_nodes: Union[str, List[str]]) IdentifiedEstimand[source]

Identify the causal effect to be estimated based on a causal graph

Parameters:
  • graph – Causal graph to be analyzed

  • treatment – name of the treatment

  • outcome – name of the outcome

Returns:

a probability expression (estimand) for the causal effect if identified, else NULL

dowhy.identify_effect_auto(graph: DiGraph, action_nodes: Union[str, List[str]], outcome_nodes: Union[str, List[str]], observed_nodes: Union[str, List[str]], estimand_type: EstimandType, conditional_node_names: Optional[List[str]] = None, backdoor_adjustment: BackdoorAdjustment = BackdoorAdjustment.BACKDOOR_DEFAULT, optimize_backdoor: bool = False, costs: Optional[List] = None) IdentifiedEstimand[source]

Main method that returns an identified estimand (if one exists).

If estimand_type is non-parametric ATE, then uses backdoor, instrumental variable and frontdoor identification methods, to check if an identified estimand exists, based on the causal graph.

Parameters:
  • optimize_backdoor – if True, uses an optimised algorithm to compute the backdoor sets

  • costs – non-negative costs associated with variables in the graph. Only used

for estimand_type=’non-parametric-ate’ and backdoor_adjustment=’efficient-mincost-adjustment’. If no costs are provided by the user, and backdoor_adjustment=’efficient-mincost-adjustment’, costs are assumed to be equal to one for all variables in the graph. :param conditional_node_names: variables that are used to determine treatment. If none are provided, it is assumed that the intervention is static. :returns: target estimand, an instance of the IdentifiedEstimand class

dowhy.identify_effect_id(graph: DiGraph, action_nodes: Union[str, List[str]], outcome_nodes: Union[str, List[str]]) IDExpression[source]

Implementation of the ID algorithm. Link - https://ftp.cs.ucla.edu/pub/stat_ser/shpitser-thesis.pdf The pseudo code has been provided on Pg 40.

Parameters:

treatment_names – OrderedSet comprising names of treatment variables.

:param outcome_names:OrderedSet comprising names of outcome variables.

Returns:

target estimand, an instance of the IDExpression class.