Independence Tests =================== Assuming we have the following data: >>> import numpy as np, pandas as pd >>> >>> X = np.random.normal(loc=0, scale=1, size=1000) >>> Y = 2 * X + np.random.normal(loc=0, scale=1, size=1000) >>> Z = 3 * Y + np.random.normal(loc=0, scale=1, size=1000) >>> data = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z)) To test whether :math:`X` is conditionally independent of :math:`Z` given :math:`Y` using the `kernel dependence measure `_, all you need to do is: >>> import dowhy.gcm as gcm >>> >>> # Null hypothesis: x is independent of y given z >>> p_value = gcm.independence_test(X, Z, conditioned_on=Y) >>> p_value 0.48386151342564865 If we define a threshold of 0.05 (as is often done as a good default), and the p-value is clearly above this, it says :math:`X` and :math:`Z` are indeed independent when we condition on :math:`Y`. This is what we would expect, given that we generated the data using the causal graph :math:`X \rightarrow Y \rightarrow Z`, where Z is conditionally independent of :math:`X` given :math:`Y`. To test whether :math:`X` is independent of :math:`Z` (*without* conditioning on :math:`Y`), we can use the same function without the third argument. >>> # Null hypothesis: x is independent of y >>> p_value = gcm.independence_test(X, Z) >>> p_value 0.0 Again, we can define a threshold of 0.05, but this time the p-value is clearly below this threshold. This says :math:`X` and :math:`Z` *are* dependent on each other. Again, this is what we would expect, since :math:`Z` is dependent on :math:`Y` and :math:`Y` is dependent on :math:`X`, but we don't condition on :math:`Y`.