# Independence Tests

Assuming we have the following data:

```
>>> import numpy as np, pandas as pd
>>>
>>> X = np.random.normal(loc=0, scale=1, size=1000)
>>> Y = 2 * X + np.random.normal(loc=0, scale=1, size=1000)
>>> Z = 3 * Y + np.random.normal(loc=0, scale=1, size=1000)
>>> data = pd.DataFrame(data=dict(X=X, Y=Y, Z=Z))
```

To test whether \(X\) is conditionally independent of \(Z\) given \(Y\) using the kernel dependence measure, all you need to do is:

```
>>> import dowhy.gcm as gcm
>>>
>>> # Null hypothesis: x is independent of y given z
>>> p_value = gcm.independence_test(X, Z, conditioned_on=Y)
>>> p_value
0.48386151342564865
```

If we define a threshold of 0.05 (as is often done as a good default), and the p-value is clearly above this, it says \(X\) and \(Z\) are indeed independent when we condition on \(Y\). This is what we would expect, given that we generated the data using the causal graph \(X \rightarrow Y \rightarrow Z\), where Z is conditionally independent of \(X\) given \(Y\).

To test whether \(X\) is independent of \(Z\) (*without* conditioning on \(Y\)), we can
use the same function without the third argument.

```
>>> # Null hypothesis: x is independent of y
>>> p_value = gcm.independence_test(X, Z)
>>> p_value
0.0
```

Again, we can define a threshold of 0.05, but this time the p-value is clearly below this threshold.
This says \(X\) and \(Z\) *are* dependent on each other. Again, this is what we would
expect, since \(Z\) is dependent on \(Y\) and \(Y\) is dependent on \(X\), but we
don’t condition on \(Y\).