# dowhy.utils package

## dowhy.utils.api module

dowhy.utils.api.parse_state(state)[source]

## dowhy.utils.cit module

dowhy.utils.cit.compute_ci(r=None, nx=None, ny=None, confidence=0.95)[source]

Compute Parametric confidence intervals around correlation coefficient. See : https://online.stat.psu.edu/stat505/lesson/6/6.3

This is done by applying Fisher’s r to z transform z = .5[ln((1+r)/(1-r))] = arctanh(r)

The Standard error is 1/sqrt(N-3) where N is sample size

The critical value for normal distribution for a corresponding confidence level is calculated from stats.norm.ppf((1 - alpha)/2) for two tailed test

The lower and upper condidence intervals in z space are calculated with the formula z ± critical value*error

The confidence interval is then converted back to r space

:param stat : correlation coefficient :param nx : length of vector x :param ny :length of vector y :param confidence : Confidence level (0.95 = 95%)

:returns : array containing confidence interval

dowhy.utils.cit.conditional_MI(data=None, x=None, y=None, z=None)[source]

Method to return conditional mutual information between X and Y given Z I(X, Y | Z) = H(X|Z) - H(X|Y,Z)

= H(X,Z) - H(Z) - H(X,Y,Z) + H(Y,Z) = H(X,Z) + H(Y,Z) - H(X,Y,Z) - H(Z)

:param data : dataset :param x,y,z : column names from dataset :returns : conditional mutual information between X and Y given Z

dowhy.utils.cit.entropy(x)[source]

” Returns entropy for a random variable x H(x) = - Σ p(x)log(p(x)) :param x : random variable to calculate entropy for :returns : entropy of random variable

dowhy.utils.cit.partial_corr(data=None, x=None, y=None, z=None, method='pearson')[source]

Calculate Partial correlation which is the degree of association between x and y after removing effect of z. This is done by calculating correlation coefficient between the residuals of two linear regressions : xsim z, ysim z See : 1 https://en.wikipedia.org/wiki/Partial_correlation

:param data : pandas dataframe :param x : Column name in data :param y : Column name in data :param z : string or list :param method : string denoting the correlation type - “pearson” or “spearman”

: returns: a python dictionary with keys as

n: Sample size r: Partial correlation coefficient CI95: 95% parametric confidence intervals p-val: p-value

## dowhy.utils.cli_helpers module

dowhy.utils.cli_helpers.query_yes_no(question, default=True)[source]

If invalid input is given, the user will be asked until they actually give valid input.

Side Effects: Blocks program execution until valid input(y/n) is given.

Parameters
• question(str) – A question that is presented to the user.

• default(bool|None) – The default value when enter is pressed with no value. When None, there is no default value and the query will loop.

Returns

A bool indicating whether user has entered yes or no.

## dowhy.utils.dgp module

class dowhy.utils.dgp.DataGeneratingProcess(**kwargs)[source]

Bases: `object`

Base class for implementation of data generating process.

Subclasses implement functions that create various data generating processes. All data generating processes are in the package “dowhy.utils.dgps”.

DEFAULT_PERCENTILE = 0.9
convert_to_binary(data, deterministic=False)[source]
generate_data()[source]
generation_process()[source]

## dowhy.utils.graph_operations module

Adds an edge i –> j to the graph, g. The edge is only added if this addition does NOT cause the graph to have cycles.

Parameters

• labels – List of labels.

Returns

Convert a given graph adjacency matrix to DOT format.

Parameters

• labels – List of labels.

Returns

Graph in DOT format.

dowhy.utils.graph_operations.convert_to_undirected_graph(g)[source]
dowhy.utils.graph_operations.daggity_to_dot(daggity_string)[source]

Converts the input daggity_string to valid DOT graph format.

Parameters

daggity_string – Output graph from Daggity site

Returns

DOT string

dowhy.utils.graph_operations.del_edge(i, j, g)[source]

Deletes the edge i –> j in the graph, g. The edge is only deleted if this removal does NOT cause the graph to be disconnected.

Finds ancestors of a given set of nodes in a given graph.

Parameters
• node_set – Set of nodes whos ancestors must be obtained.

• node_names – Name of all nodes in the graph.

• node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.

• idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.

Returns

OrderedSet containing ancestors of all nodes in the node_set.

Obtain C-components in a graph.

Parameters

• node_set – Set of nodes whos ancestors must be obtained.

• idx2node – A dictionary mapping the row or column indices in the adjacency matrix to the corresponding node names.

Returns

List of C-components in the graph.

dowhy.utils.graph_operations.find_predecessor(i, j, g)[source]

Finds a predecessor, k, in the path between two nodes, i and j, in the graph, g.

dowhy.utils.graph_operations.get_random_node_pair(n)[source]

Randomly generates a pair of nodes.

dowhy.utils.graph_operations.get_simple_ordered_tree(n)[source]

Generates a simple-ordered tree. The tree is just a directed acyclic graph of n nodes with the structure 0 –> 1 –> …. –> n.

To obtain the induced graph corresponding to a subset of nodes.

Parameters
• node_set – Set of nodes whos ancestors must be obtained.

• node2idx – A dictionary mapping node names to their row or column index in the adjacency matrix.

Returns

Numpy array representing the adjacency matrix of the induced graph.

dowhy.utils.graph_operations.is_connected(g)[source]

Checks if a the directed acyclic graph is connected.

dowhy.utils.graph_operations.str_to_dot(string)[source]

Converts input string from graphviz library to valid DOT graph format.

Parameters

string – Graph in DOT format.

Returns

DOT string converted to a suitable format for the DoWhy library.

## dowhy.utils.ordered_set module

class dowhy.utils.ordered_set.OrderedSet(elements=None)[source]

Bases: `object`

Python class for ordered set. Code taken from https://github.com/buyalsky/ordered-hash-set/tree/5198b23e01faeac3f5398ab2c08cb013d14b3702.

Function to add an element to do set if it does not exit.

Parameters

element – element to be added.

difference(other_set)[source]

Function to remove elements in self._set which are also present in other_set.

Parameters

other_set – The set to obtain difference with. Can be a list, set or OrderedSet.

Returns

New OrderedSet representing the difference of elements in the self._set and other_set.

get_all()[source]

Function to return list of all elements in the set.

Returns

List of all items in the set.

intersection(other_set)[source]

Function to compute the intersection of self._set and other_set.

Parameters

other_set – The set to obtain intersection with. Can be a list, set or OrderedSet.

Returns

New OrderedSet representing the set with elements common to the OrderedSet object and other_set.

is_empty()[source]

Function to determine if the set is empty or not.

Returns

`True` if the set is empty, `False` otherwise.

union(other_set)[source]

Function to compute the union of self._set and other_set.

Parameters

other_set – The set to obtain union with. Can be a list, set or OrderedSet.

Returns

New OrderedSet representing the set with elements from the OrderedSet object and other_set.

## dowhy.utils.propensity_score module

dowhy.utils.propensity_score.binarize_discrete(data, covariates, variable_types)[source]
dowhy.utils.propensity_score.binary_treatment_model(data, covariates, treatment, variable_types)[source]
dowhy.utils.propensity_score.categorical_treatment_model(data, covariates, treatment, variable_types)[source]
dowhy.utils.propensity_score.continuous_treatment_model(data, covariates, treatment, variable_types)[source]
dowhy.utils.propensity_score.discrete_to_integer(discrete)[source]
dowhy.utils.propensity_score.get_type_string(variables, variable_types)[source]
dowhy.utils.propensity_score.propensity_of_treatment_score(data, covariates, treatment, model='logistic', variable_types=None)[source]
dowhy.utils.propensity_score.state_propensity_score(data, covariates, treatments, variable_types=None)[source]