dowhy.utils package


dowhy.utils.api module


dowhy.utils.cit module

dowhy.utils.cit.compute_ci(r=None, nx=None, ny=None, confidence=0.95)[source]

Compute Parametric confidence intervals around correlation coefficient. See :

This is done by applying Fisher’s r to z transform z = .5[ln((1+r)/(1-r))] = arctanh(r)

The Standard error is 1/sqrt(N-3) where N is sample size

The critical value for normal distribution for a corresponding confidence level is calculated from stats.norm.ppf((1 - alpha)/2) for two tailed test

The lower and upper condidence intervals in z space are calculated with the formula z ± critical value*error

The confidence interval is then converted back to r space

:param stat : correlation coefficient :param nx : length of vector x :param ny :length of vector y :param confidence : Confidence level (0.95 = 95%)

:returns : array containing confidence interval

dowhy.utils.cit.conditional_MI(data=None, x=None, y=None, z=None)[source]

Method to return conditional mutual information between X and Y given Z I(X, Y | Z) = H(X|Z) - H(X|Y,Z)

= H(X,Z) - H(Z) - H(X,Y,Z) + H(Y,Z) = H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z)

:param data : dataset :param x,y,z : column names from dataset :returns : conditional mutual information between X and Y given Z


” Returns entropy for a random variable x H(x) = - Σ p(x)log(p(x)) :param x : random variable to calculate entropy for :returns : entropy of random variable

dowhy.utils.cit.partial_corr(data=None, x=None, y=None, z=None, method='pearson')[source]

Calculate Partial correlation which is the degree of association between x and y after removing effect of z. This is done by calculating correlation coefficient between the residuals of two linear regressions : xsim z, ysim z See : 1

:param data : pandas dataframe :param x : Column name in data :param y : Column name in data :param z : string or list :param method : string denoting the correlation type - “pearson” or “spearman”

: returns: a python dictionary with keys as

n: Sample size r: Partial correlation coefficient CI95: 95% parametric confidence intervals p-val: p-value

dowhy.utils.cli_helpers module

dowhy.utils.cli_helpers.query_yes_no(question, default=True)[source]

Ask a yes/no question via standard input and return the answer.


If invalid input is given, the user will be asked until they actually give valid input.

Side Effects: Blocks program execution until valid input(y/n) is given.

  • question(str) – A question that is presented to the user.

  • default(bool|None) – The default value when enter is pressed with no value. When None, there is no default value and the query will loop.


A bool indicating whether user has entered yes or no.

dowhy.utils.dgp module

class dowhy.utils.dgp.DataGeneratingProcess(**kwargs)[source]

Bases: object

Base class for implementation of data generating process.

Subclasses implement functions that create various data generating processes. All data generating processes are in the package “dowhy.utils.dgps”.

convert_to_binary(data, deterministic=False)[source]

dowhy.utils.graph_operations module

dowhy.utils.graph_operations.add_edge(i, j, g)[source]

Adds an edge i –> j to the graph, g. The edge is only added if this addition does NOT cause the graph to have cycles.

dowhy.utils.graph_operations.adjacency_matrix_to_adjacency_list(adjacency_matrix, labels=None)[source]

Convert the adjacency matrix of a graph to an adjacency list.

  • adjacency_matrix – A numpy array representing the graph adjacency matrix.

  • labels – List of labels.


Adjacency list as a dictionary.

dowhy.utils.graph_operations.adjacency_matrix_to_graph(adjacency_matrix, labels=None)[source]

Convert a given graph adjacency matrix to DOT format.

  • adjacency_matrix – A numpy array representing the graph adjacency matrix.

  • labels – List of labels.


Graph in DOT format.


Converts the input daggity_string to valid DOT graph format.


daggity_string – Output graph from Daggity site


DOT string

dowhy.utils.graph_operations.del_edge(i, j, g)[source]

Deletes the edge i –> j in the graph, g. The edge is only deleted if this removal does NOT cause the graph to be disconnected.

dowhy.utils.graph_operations.find_ancestor(node_set, node_names, adjacency_matrix, node2idx, idx2node)[source]

Finds ancestors of a given set of nodes in a given graph.

  • node_set – Set of nodes whos ancestors must be obtained.

  • node_names – Name of all nodes in the graph.

  • adjacency_matrix – Graph adjacency matrix.

  • node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.

  • idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.


OrderedSet containing ancestors of all nodes in the node_set.

dowhy.utils.graph_operations.find_c_components(adjacency_matrix, node_set, idx2node)[source]

Obtain C-components in a graph.

  • adjacency_matrix – Graph adjacency matrix.

  • node_set – Set of nodes whos ancestors must be obtained.

  • idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.


List of C-components in the graph.

dowhy.utils.graph_operations.find_predecessor(i, j, g)[source]

Finds a predecessor, k, in the path between two nodes, i and j, in the graph, g.


Randomly generates a pair of nodes.


Generates a simple-ordered tree. The tree is just a directed acyclic graph of n nodes with the structure 0 –> 1 –> …. –> n.

dowhy.utils.graph_operations.induced_graph(node_set, adjacency_matrix, node2idx)[source]

To obtain the induced graph corresponding to a subset of nodes.

  • node_set – Set of nodes whos ancestors must be obtained.

  • adjacency_matrix – Graph adjacency matrix.

  • node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.


Numpy array representing the adjacency matrix of the induced graph.


Checks if a the directed acyclic graph is connected.


Converts input string from graphviz library to valid DOT graph format.


string – Graph in DOT format.


DOT string converted to a suitable format for the DoWhy library.

dowhy.utils.ordered_set module

class dowhy.utils.ordered_set.OrderedSet(elements=None)[source]

Bases: object

Python class for ordered set. Code taken from


Function to add an element to do set if it does not exit.


element – element to be added.


Function to remove elements in self._set which are also present in other_set.


other_set – The set to obtain difference with. Can be a list, set or OrderedSet.


New OrderedSet representing the difference of elements in the self._set and other_set.


Function to return list of all elements in the set.


List of all items in the set.


Function to compute the intersection of self._set and other_set.


other_set – The set to obtain intersection with. Can be a list, set or OrderedSet.


New OrderedSet representing the set with elements common to the OrderedSet object and other_set.


Function to determine if the set is empty or not.


True if the set is empty, False otherwise.


Function to compute the union of self._set and other_set.


other_set – The set to obtain union with. Can be a list, set or OrderedSet.


New OrderedSet representing the set with elements from the OrderedSet object and other_set.

dowhy.utils.propensity_score module

dowhy.utils.propensity_score.binarize_discrete(data, covariates, variable_types)[source]
dowhy.utils.propensity_score.binary_treatment_model(data, covariates, treatment, variable_types)[source]
dowhy.utils.propensity_score.categorical_treatment_model(data, covariates, treatment, variable_types)[source]
dowhy.utils.propensity_score.continuous_treatment_model(data, covariates, treatment, variable_types)[source]
dowhy.utils.propensity_score.get_type_string(variables, variable_types)[source]
dowhy.utils.propensity_score.propensity_of_treatment_score(data, covariates, treatment, model='logistic', variable_types=None)[source]
dowhy.utils.propensity_score.state_propensity_score(data, covariates, treatments, variable_types=None)[source]

dowhy.utils.regression module


Creates a list of polynomial functions


max_degree – degree of the polynomial function to be created


list of lambda functions

dowhy.utils.regression.generate_moment_function(W, g)[source]

Generate and returns moment function m(W,g) = g(1,W) - g(0,W) for Average Causal Effect

dowhy.utils.regression.get_generic_regressor(cv, X, Y, max_degree=3, estimator_list=None, estimator_param_list=None, numeric_features=None)[source]

Finds the best estimator for regression function (g_s)

  • cv – training and testing data indices obtained afteer Kfolding the dataset

  • X – regressors data for training the regression model

  • Y – outcome data for training the regression model

  • max_degree – degree of the polynomial function used to approximate the regression function

  • estimator_list – list of estimator objects for finding the regression function

  • estimator_param_list – list of dictionaries with parameters for tuning respective estimators in estimator_list

  • numeric_features – list of indices of numeric features in the dataset


estimator for Reisz Regression function


Finds the numeric feature columns in a dataset


X – pandas dataframe

returns: list of indices of numeric features

Module contents