dowhy.causal_refuters package
Submodules
dowhy.causal_refuters.add_unobserved_common_cause module
- class dowhy.causal_refuters.add_unobserved_common_cause.AddUnobservedCommonCause(*args, **kwargs)[source]
Bases:
CausalRefuter
Add an unobserved confounder for refutation.
Supports additional parameters that can be specified in the refute_estimate() method.
‘confounders_effect_on_treatment’: how the simulated confounder affects the value of treatment. This can be linear (for continuous treatment) or binary_flip (for binary treatment)
‘confounders_effect_on_outcome’: how the simulated confounder affects the value of outcome. This can be linear (for continuous outcome) or binary_flip (for binary outcome)
‘effect_strength_on_treatment’: parameter for the strength of the effect of simulated confounder on treatment. For linear effect, it is the regression coeffient. For binary_flip, it is the probability that simulated confounder’s effect flips the value of treatment from 0 to 1 (or vice-versa).
‘effect_strength_on_outcome’: parameter for the strength of the effect of simulated confounder on outcome. For linear effect, it is the regression coeffient. For binary_flip, it is the probability that simulated confounder’s effect flips the value of outcome from 0 to 1 (or vice-versa).
TODO: Needs scaled version of the parameters and an interpretation module (e.g., in comparison to biggest effect of known confounder)
Initialize the parameters required for the refuter
- Parameters
effect_on_t – str : This is used to represent the type of effect on the treatment due to the unobserved confounder.
effect_on_y – str : This is used to represent the type of effect on the outcome due to the unobserved confounder.
kappa_t – float, numpy.ndarray: This refers to the strength of the confounder on treatment. For a linear effect, it behaves like the regression coeffecient. For a binary flip it is the probability with which it can invert the value of the treatment.
kappa_y – floar, numpy.ndarray: This refers to the strength of the confounder on outcome. For a linear effect, it behaves like the regression coefficient. For a binary flip, it is the probability with which it can invert the value of the outcome.
- include_confounders_effect(new_data, kappa_t, kappa_y)[source]
This function deals with the change in the value of the data due to the effect of the unobserved confounder. In the case of a binary flip, we flip only if the random number is greater than the threshold set. In the case of a linear effect, we use the variable as the linear regression constant.
- Parameters
new_data – pandas.DataFrame: The data to be changed due to the effects of the unobserved confounder.
kappa_t – numpy.float64: The value of the threshold for binary_flip or the value of the regression coefficient for linear effect.
kappa_y – numpy.float64: The value of the threshold for binary_flip or the value of the regression coefficient for linear effect.
- Returns
pandas.DataFrame: The DataFrame that includes the effects of the unobserved confounder.
- refute_estimate()[source]
This function attempts to add an unobserved common cause to the outcome and the treatment. At present, we have implemented the behavior for one dimensional behaviors for continueous and binary variables. This function can either take single valued inputs or a range of inputs. The function then looks at the data type of the input and then decides on the course of action.
- Returns
CausalRefuter: An object that contains the estimated effect and a new effect and the name of the refutation used.
dowhy.causal_refuters.bootstrap_refuter module
- class dowhy.causal_refuters.bootstrap_refuter.BootstrapRefuter(*args, **kwargs)[source]
Bases:
CausalRefuter
Refute an estimate by running it on a random sample of the data containing measurement error in the confounders. This allows us to find the ability of the estimator to find the effect of the treatment on the outcome.
It supports additional parameters that can be specified in the refute_estimate() method.
-‘num_simulations’: int, CausalRefuter.DEFAULT_NUM_SIMULATIONS by default The number of simulations to be run - ‘sample_size’: int, Size of the original data by default The size of each bootstrap sample - ‘required_variables’: int, list, bool, True by default A user can input either an integer value, list or bool.
An integer argument refers to how many variables will be modified
- A list allows the user to explicitly refer to which variables should be selected to be made noisy
Furthermore, a user can either choose to select the variables desired. Or they can deselect the variables, that they do not want in their analysis. For example: We need to pass required_variables = [W0,W1] is we want W0 and W1. We need to pass required_variables = [-W0,-W1] if we want all variables excluding W0 and W1.
If the user passes True, noise is added to confounders, instrumental variables and effect modifiers If the value is False, we just Bootstrap the existing dataset
‘noise’: float, BootstrapRefuter.DEFAULT_STD_DEV by default
The standard deviation of the noise to be added to the data - ‘probability_of_change’: float, ‘noise’ by default if the value is less than 1 It specifies the probability with which we change the data for a boolean or categorical variable - ‘random_state’: int, RandomState, None by default The seed value to be added if we wish to repeat the same random behavior. For this purpose, we repeat the same seed in the psuedo-random generator.
- DEFAULT_NUMBER_OF_TRIALS = 1
- DEFAULT_STD_DEV = 0.1
- DEFAULT_SUCCESS_PROBABILITY = 0.5
dowhy.causal_refuters.data_subset_refuter module
- class dowhy.causal_refuters.data_subset_refuter.DataSubsetRefuter(*args, **kwargs)[source]
Bases:
CausalRefuter
Refute an estimate by rerunning it on a random subset of the original data.
Supports additional parameters that can be specified in the refute_estimate() method.
‘subset_fraction’: float, 0.8 by default
Fraction of the data to be used for re-estimation. - ‘num_simulations’: int, CausalRefuter.DEFAULT_NUM_SIMULATIONS by default The number of simulations to be run - random_state’: int, RandomState, None by default The seed value to be added if we wish to repeat the same random behavior. If we with to repeat the same behavior we push the same seed in the psuedo-random generator
dowhy.causal_refuters.dummy_outcome_refuter module
- class dowhy.causal_refuters.dummy_outcome_refuter.DummyOutcomeRefuter(*args, **kwargs)[source]
Bases:
CausalRefuter
Refute an estimate by replacing the outcome with a randomly generated variable.
Supports additional parameters that can be specified in the refute_estimate() method.
‘num_simulations’: int, CausalRefuter.DEFAULT_NUM_SIMULATIONS by default
The number of simulations to be run
‘transformation_list’: list, DummyOutcomeRefuter.DEFAULT_TRANSFORMATION
The transformation_list is a list of actions to be performed to obtain the outcome. The actions are of the following types: * function argument: function pd.Dataframe -> np.ndarray
It takes in a function that takes the input data frame as the input and outputs the outcome variable. This allows us to create an output varable that only depends on the covariates and does not depend on the treatment variable.
- string argument
- Currently it supports some common functions like
Linear Regression
K Nearest Neighbours
Support Vector Machine
Neural Network
Random Forest
- On the other hand, there are other options:
1. Permute This permutes the rows of the outcome, disassociating any effect of the treatment on the outcome. 2. Noise This adds white noise to the outcome with white noise, reducing any causal relationship with the treatment. 3. Zero It replaces all the values in the outcome by zero
The transformation_list is of the following form: * If the function pd.Dataframe -> np.ndarray is already defined. [(func,func_params),(‘permute’, {‘permute_fraction’: val} ), (‘noise’, {‘std_dev’: val} )] * If a function from the above list is used [(‘knn’,{‘n_neighbors’:5}), (‘permute’, {‘permute_fraction’: val} ), (‘noise’, {‘std_dev’: val} )]
‘required_variables’: int, list, bool, True by default
- The inputs are either an integer value, list or bool.
An integer argument refers to how many variables will be used for estimating the value of the outcome
- A list explicitly refers to which variables will be used to estimate the outcome
Furthermore, it gives the ability to explictly select or deselect the covariates present in the estimation of the outcome. This is done by either adding or explicitly removing variables from the list as shown below: For example: We need to pass required_variables = [W0,W1] is we want W0 and W1. We need to pass required_variables = [-W0,-W1] if we want all variables excluding W0 and W1.
If the value is True, we wish to include all variables to estimate the value of the outcome. A False value is INVALID and will result in an error.
Note: These inputs are fed to the function for estimating the outcome variable. The same set of required_variables is used for each instance of an internal function.
‘bucket_size_scale_factor’: float, DummyOutcomeRefuter.DEFAULT_BUCKET_SCALE_FACTOR by default
For continuous data, the scale factor helps us scale the size of the bucket used on the data Note: The number of buckets is given by:
(scale_factor * std_dev)
‘min_data_point_threshold’: int, DummyOutcomeRefuter.MIN_DATA_POINT_THRESHOLD by default
The minimum number of data points for an estimator to run.
- DEFAULT_BUCKET_SCALE_FACTOR = 0.5
- DEFAULT_STD_DEV = 0.1
- DEFAULT_TRANSFORMATION = [('zero', ''), ('noise', {'std_dev': 1})]
- MIN_DATA_POINT_THRESHOLD = 30
- SUPPORTED_ESTIMATORS = ['linear_regression', 'knn', 'svm', 'random_forest', 'neural_network']
- check_for_estimator()[source]
This function checks if there is an estimator in the transformation list. If there are no estimators, it allows us to optimize processing by skipping the data preprocessing and running the transformations on the whole dataset.
- preprocess_data_by_treatment()[source]
This function groups data based on the data type of the treatment.
Expected variable types supported for the treatment: * bool * pd.categorical * float * int
returns pandas.core.groupby.generic.DataFrameGroupBy
- process_data(X_train, outcome_train, X_validation, outcome_validation, transformation_list)[source]
We process the data by first training the estimators in the transformation_list on X_train and outcome_train. We then apply the estimators on X_validation to get the value of the dummy outcome, which we store in outcome_validation.
‘X_train’: np.ndarray
The data of the covariates which is used to train an estimator. It corresponds to the data of a single category of the treatment - ‘outcome_train’: np.ndarray This is used to hold the intermediate values of the outcome variable in the transformation list For Example:
[ (‘permute’, {‘permute_fraction’: val} ), (func,func_params)] The value obtained from permutation is used as an input for the custom estimator.
‘X_validation’: np.ndarray
The data of the covariates that is fed to a trained estimator to generate a dummy outcome - ‘outcome_validation’: np.ndarray This variable stores the dummy_outcome generated by the transformations - ‘transformation_list’: np.ndarray The list of transformations on the outcome data required to produce a dummy outcome
dowhy.causal_refuters.placebo_treatment_refuter module
- class dowhy.causal_refuters.placebo_treatment_refuter.PlaceboTreatmentRefuter(*args, **kwargs)[source]
Bases:
CausalRefuter
Refute an estimate by replacing treatment with a randomly-generated placebo variable.
Supports additional parameters that can be specified in the refute_estimate() method.
‘placebo_type’: str, None by default
Default is to generate random values for the treatment. If placebo_type is “permute”, then the original treatment values are permuted by row. - ‘num_simulations’: int, CausalRefuter.DEFAULT_NUM_SIMULATIONS by default The number of simulations to be run - ‘random_state’: int, RandomState, None by default The seed value to be added if we wish to repeat the same random behavior. If we want to repeat the same behavior we push the same seed in the psuedo-random generator
- DEFAULT_MEAN_OF_NORMAL = 0
- DEFAULT_NUMBER_OF_TRIALS = 1
- DEFAULT_PROBABILITY_OF_BINOMIAL = 0.5
- DEFAULT_STD_DEV_OF_NORMAL = 0
dowhy.causal_refuters.random_common_cause module
- class dowhy.causal_refuters.random_common_cause.RandomCommonCause(*args, **kwargs)[source]
Bases:
CausalRefuter
Refute an estimate by introducing a randomly generated confounder (that may have been unobserved).