dowhy.causal_refuters.overrule.BCS package
Submodules
dowhy.causal_refuters.overrule.BCS.beam_search module
Beam search utilities for optimization.
This module implements the boolean ruleset estimator from OverRule [1]. Code is adapted (with some simplifications) from https://github.com/clinicalml/overlap-code, under the MIT License.
[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138
- class dowhy.causal_refuters.overrule.BCS.beam_search.PricingInstance(rp, rn, Xp, Xn, v0, z0)[source]
- Bases: - object- Instance of the pricing problem. - For more details, see: - Dash, S., Gunluk, O., and Wei, D. (2018). Boolean decision rules via column generation. In Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa- Bianchi, N., and Garnett, R., editors, Advances in Neural Information Processing Systems 31, pages 4660–4670. Curran Associates, Inc. 
- dowhy.causal_refuters.overrule.BCS.beam_search.beam_search(r, X, lambda0: float, lambda1: float, K: int = 1, UB: float = 0, D: int = 10, B: int = 5, wLB: float = 0.5, eps: float = 1e-06)[source]
- Beam search to generate solutions to pricing problem. - Parameters:
- r – Cost vector (residuals) 
- X – Binary features in a DataFrame 
- lambda0 (float) – Fixed cost of a term 
- lambda1 (float) – Cost per literal 
- K (int, optional) – Maximum number of solutions returned, defaults to 1 
- UB (float, optional) – Initial upper bound on value of solutions, defaults to 0 
- D (int, optional) – Maximum Degree, defaults to 10 
- B (int, optional) – Beam width, defaults to 5 
- wLB (float, optional) – Weight on lower bound in evaluating nodes, defaults to 0.5 
- eps (float, optional) – Numerical tolerance on comparisons, defaults to 1e-6 
 
 
dowhy.causal_refuters.overrule.BCS.load_process_data_BCS module
Code for Binarizing Features.
This module implements the boolean ruleset estimator from OverRule [1]. Code is adapted (with some simplifications) from https://github.com/clinicalml/overlap-code, under the MIT License.
[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138
- class dowhy.causal_refuters.overrule.BCS.load_process_data_BCS.FeatureBinarizer(colCateg: List[str] = [], numThresh: int = 9, negations: bool = False, threshStr: bool = False, threshOverride: Dict = {}, **kwargs)[source]
- Bases: - TransformerMixin- Transformer for binarizing categorical and ordinal (including continuous) features. - Note that all features are converted into binary variables before learning Boolean rules. - Initialize transformer for binarizing categorical and ordinal (including continuous) features - Parameters:
- colCateg (List[str], optional) – List of categorical columns, defaults to [], ‘object’ dtype automatically treated as categorical 
- numThresh (int, optional) – Number of quantile thresholds to binarize ordinal features, defaults to 9 
- negations (bool, optional) – Include negations, defaults to False 
- threshStr (bool, optional) – Convert thresholds to strings, defaults to False 
- threshOverride (Dict, optional) – Dictionary to override quantile thresholds, defaults to {}, formatted as {colname : np.linspace object} to define cuts 
 
 - fit(X)[source]
- Fit to data, including the learning of thresholds where appropriate. - Sets the following internal variables: * maps = dictionary of mappings for unary/binary columns * enc = dictionary of OneHotEncoders for categorical columns * thresh = dictionary of lists of thresholds for ordinal columns * NaN = list of ordinal columns containing NaN values - Parameters:
- X (pd.DataFrame) – Original features as a Pandas Dataframe 
 
 
dowhy.causal_refuters.overrule.BCS.overlap_boolean_rule module
OverlapBooleanRule.
This module implements the boolean ruleset estimator from OverRule [1]. Code is adapted (with some simplifications) from https://github.com/clinicalml/overlap-code, under the MIT License.
[1] Oberst, M., Johansson, F., Wei, D., Gao, T., Brat, G., Sontag, D., & Varshney, K. (2020). Characterization of Overlap in Observational Studies. In S. Chiappa & R. Calandra (Eds.), Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics (Vol. 108, pp. 788–798). PMLR. https://arxiv.org/abs/1907.04138
- class dowhy.causal_refuters.overrule.BCS.overlap_boolean_rule.OverlapBooleanRule(alpha=0.95, lambda0=0.01, lambda1=0.01, K=20, D=20, B=10, iterMax=10, eps=1e-06, silent=False, verbose=False, solver='ECOS', rounding='greedy_sweep')[source]
- Bases: - object- Overlap Boolean Rule class in the style of scikit-learn - Learn Boolean Rules in Disjuntive Normal Form to describe the positive class. - Parameters:
- alpha (float, optional) – Fraction of the positive samples to ensure are included in the rules, defaults to 0.95 
- lambda0 (float, optional) – Regularization on the # of rules, defaults to 1e-2 
- lambda1 (float, optional) – Regularization on the # of literals, defaults to 1e-2 
- K (int, optional) – Maximum results returned during beam search, defaults to 20 
- D (int, optional) – Maximum extra rules per beam seach iteration, defaults to 20 
- B (int, optional) – Width of beam search, defaults to 10 
- iterMax (int, optional) – Maximum number of iterations of column generation, defaults to 10 
- eps (float, optional) – Numerical tolerance on comparisons, defaults to 1e-6 
- silent (bool) – Silence non-optimizer output, defaults to False 
- verbose (bool, optional) – Verbose optimizer output, defaults to False 
- solver (str, optional) – Linear programming solver used by CVXPY to solve the LP relaxation, defaults to ‘ECOS’ 
- rounding (str, optional) – Strategy to perform rounding, either ‘greedy’ or ‘greedy_sweep’, defaults to ‘greedy_sweep’ 
 
 - fit(X: DataFrame, y: Union[ndarray, DataFrame])[source]
- Fit model to training data. - Parameters:
- X – Pandas DataFrame containing covariates 
- y – +1 for Overlap/Support (depending on rules being learned), 0 for non-overlap, and -1 for background samples. Should only contain (+1/0) for overlap rules, or (+1/-1) for learning support rules. 
 
 
 - greedy_round_(X: DataFrame, y: Union[ndarray, DataFrame], xi: float = 0.5, use_lp: bool = False)[source]
- Round the rule coefficients to integer values. - For DNF, this starts with no conjunctions, and adds them greedily based on a cost, which penalizes (any) inclusion of negative samples, and rewards (new) inclusion of positive samples, and goes until it covers at least alpha fraction of positive samples. - Parameters:
- X – Pandas DataFrame containing covariates 
- y – +1 for Overlap/Support (depending on rules being learned), 0 for non-overlap, and -1 for background samples. Should only contain (+1/0) for overlap rules, or (+1/-1) for learning support rules. 
- xi – Reward for including positive samples, relative to cost (1) for including negative samples 
- use_lp – Restrict to those conjuctions where the LP coefficients are positive. Note that the LP makes a difference regardless, as we only consider the rules generated by column generation here. 
 
 
 - round_(X: DataFrame, y: Union[ndarray, DataFrame], scoring: str = 'greedy', xi=None, use_lp: bool = True)[source]
- Round the rule coefficients to integer values via a greedy approach, either using a fixed reward (scoring=”greedy”) or optimizing the reward for including positive examples according to balanced accuracy on classifying positive vs negative samples (scoring=”greedy_sweep). - Parameters:
- X – Pandas DataFrame containing covariates 
- y – +1 for Overlap/Support (depending on rules being learned), 0 for non-overlap, and -1 for background samples. Should only contain (+1/0) for overlap rules, or (+1/-1) for learning support rules. 
- xi – Reward for including positive samples, relative to cost (1) for including negative samples. For scoring=”greedy”, should be a single value, or an array of values for scoring=”greedy_sweep”. For the latter, will default to np.logspace(np.log10(0.01), 0.5, 20). 
- use_lp – Restrict to those conjuctions where the LP coefficients are positive. Note that the LP makes a difference regardless, as we only consider the rules generated by column generation here.