Independence testing
Independence testing API
The independence test API is defined by
CausalityTools.independence
— Functionindependence(test::IndependenceTest, x, y, [z]) → summary
Perform the given IndependenceTest
test
on data x
, y
and z
. If only x
and y
are given, test
must provide a bivariate association measure. If z
is given too, then test
must provide a conditional association measure.
Returns a test summary
, whose type depends on test
.
Compatible independence tests
CausalityTools.IndependenceTest
— TypeIndependenceTest <: IndependenceTest
The supertype for all independence tests.
SurrogateTest
CausalityTools.SurrogateTest
— TypeSurrogateTest <: IndependenceTest
SurrogateTest(measure, [est];
nshuffles::Int = 100,
surrogate = RandomShuffle(),
rng = Random.default_rng(),
)
A generic (conditional) independence test for assessing whether two variables X
and Y
are independendent, potentially conditioned on a third variable Z
, based on surrogate data.
When used with independence
, a SurrogateTestResult
is returned.
Description
This is a generic one-sided hypothesis test that checks whether x
and y
are independent (given z
, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between the input variables.
There are different ways of shuffling, dictated by surrogate
, each representing a distinct null hypothesis. For each shuffle, the provided measure
is computed (using est
, if relevant). This procedure is repeated nshuffles
times, and a test summary is returned. The shuffled variable is always the first variable (X
). Exceptions are:
- If
TransferEntropy
measure such asTEShannon
, then the source variable is always shuffled, and the target and conditional variable are left unshuffled.
Compatible measures
Measure | Pairwise | Conditional | Requires est |
---|---|---|---|
PearsonCorrelation | ✓ | ✖ | No |
DistanceCorrelation | ✓ | ✓ | No |
SMeasure | ✓ | ✖ | No |
HMeasure | ✓ | ✖ | No |
MMeasure | ✓ | ✖ | No |
LMeasure | ✓ | ✖ | No |
PairwiseAsymmetricInference | ✓ | ✖ | Yes |
ConvergentCrossMapping | ✓ | ✖ | Yes |
MIShannon | ✓ | ✖ | Yes |
MIRenyiJizba | ✓ | ✖ | Yes |
MIRenyiSarbu | ✓ | ✖ | Yes |
MITsallisMartin | ✓ | ✖ | Yes |
MITsallisFuruichi | ✓ | ✖ | Yes |
PartialCorrelation | ✖ | ✓ | Yes |
CMIShannon | ✖ | ✓ | Yes |
CMIRenyiJizba | ✖ | ✓ | Yes |
TEShannon | ✓ | ✓ | Yes |
TERenyiJizba | ✓ | ✓ | Yes |
PMI | ✖ | ✓ | Yes |
Examples
CausalityTools.SurrogateTestResult
— TypeSurrogateTestResult(m, m_surr, pvalue)
Holds the result of a SurrogateTest
. m
is the measure computed on the original data. m_surr
is a vector of the measure computed on permuted data, where m_surr[i]
is the measure compute on the i
-th permutation. pvalue
is the one-sided p
-value for the test.
LocalPermutationTest
CausalityTools.LocalPermutationTest
— TypeLocalPermutationTest <: IndependenceTest
LocalPermutationTest(measure, [est];
kperm::Int = 5,
nshuffles::Int = 100,
rng = Random.default_rng(),
replace = true,
w::Int = 0)
LocalPermutationTest
is a generic conditional independence test (Runge, 2018) for assessing whether two variables X
and Y
are conditionally independendent given a third variable Z
(all of which may be multivariate).
When used with independence
, a LocalPermutationTestResult
is returned.
Description
This is a generic one-sided hypothesis test that checks whether X
and Y
are independent (given Z
, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between x
and y
, but preserve dependencies between x
and z
.
The algorithm is as follows:
- Compute the original conditional independence statistic
I(X; Y | Z)
. - Allocate a scalar valued vector
Î
with space fornshuffles
elements. - For
k ∈ [1, 2, …, nshuffles]
, repeat- For each
zᵢ ∈ Y
, letnᵢ
be time indices of thekperm
nearest neighbors ofzᵢ
, excluding thew
nearest neighbors ofzᵢ
from the neighbor query (i.ew
is the Theiler window). - Let
xᵢ⋆ = X[j]
, wherej
is randomly sampled fromnᵢ
with replacement. This way,xᵢ
is replaced withxⱼ
only ifzᵢ ≈ zⱼ
(zᵢ
andzⱼ
are close). Repeat fori = 1, 2, …, n
and obtain the shuffledX̂ = [x̂₁, x̂₂, …, x̂ₙ]
. - Compute the conditional independence statistic
Iₖ(X̂; Y | Z)
. - Let
Î[k] = Iₖ(X̂; Y | Z)
.
- For each
- Compute the p-value as
count(Î[k] .<= I) / nshuffles)
.
In additional to the conditional variant from Runge (2018), we also provide a pairwise version, where the shuffling procedure is identical, except neighbors in Y
are used instead of Z
and we I(X; Y)
and Iₖ(X̂; Y)
instead of I(X; Y | Z)
and Iₖ(X̂; Y | Z)
.
Compatible measures
Measure | Pairwise | Conditional | Requires est |
---|---|---|---|
PartialCorrelation | ✖ | ✓ | No |
DistanceCorrelation | ✖ | ✓ | No |
CMIShannon | ✖ | ✓ | Yes |
TEShannon | ✓ | ✓ | Yes |
PMI | ✖ | ✓ | Yes |
The LocalPermutationTest
is only defined for conditional independence testing. Exceptions are for measures like TEShannon
, which use conditional measures under the hood even for their pairwise variants, and are therefore compatible with LocalPermutationTest
.
The nearest-neighbor approach in Runge (2018) can be reproduced by using the CMIShannon
measure with the FPVP
estimator.
Examples
CausalityTools.LocalPermutationTestResult
— TypeLocalPermutationTestResult(m, m_surr, pvalue)
Holds the result of a LocalPermutationTest
. m
is the measure computed on the original data. m_surr
is a vector of the measure computed on permuted data, where m_surr[i]
is the measure compute on the i
-th permutation. pvalue
is the one-sided p
-value for the test.
JointDistanceDistributionTest
CausalityTools.JointDistanceDistributionTest
— TypeJointDistanceDistributionTest <: IndependenceTest
JointDistanceDistributionTest(measure::JointDistanceDistribution; rng = Random.default_rng())
An independence test for two variables based on the JointDistanceDistribution
(Amigó and Hirata, 2018).
When used with independence
, a JDDTestResult
is returned.
Description
The joint distance distribution (labelled Δ
in their paper) is used by Amigó & Hirata (2018) to detect directional couplings of the form $X \to Y$ or $Y \to X$. JointDistanceDistributionTest
formulates their method as an independence test.
Formally, we test the hypothesis $H_0$ (the variables are independent) against $H_1$ (there is directional coupling between the variables). To do so, we use a right-sided/upper-tailed t-test to check mean of Δ
is skewed towards positive value, i.e.
- $H_0 := \mu(\Delta) = 0$
- $H_1 := \mu(\Delta) > 0$.
When used with independence
, a JDDTestResult
is returned, which contains the joint distance distribution and a p-value. If you only need Δ
, use jdd
directly.
Examples
This example shows how the JointDistanceDistributionTest
can be used in practice.
CausalityTools.JDDTestResult
— TypeJDDTestResult(Δjdd, hypothetical_μ, pvalue)
Holds the results of JointDistanceDistributionTest
. Δjdd
is the Δ
-distribution, hypothetical_μ
is the hypothetical mean of the Δ
-distribution under the null, and pvalue
is the p-value for the one-sided t-test.
CorrTest
CausalityTools.CorrTest
— TypeCorrTest <: IndependenceTest
CorrTest()
An independence test based correlation (for two variables) and partial correlation (for three variables) (Levy and Narula (1978)@; as described in Schmidt et al. (2018)).
Uses PearsonCorrelation
and PartialCorrelation
internally.
Assumes that the input data are (multivariate) normally distributed. Then ρ(X, Y) = 0
implies X ⫫ Y
and ρ(X, Y | 𝐙) = 0
implies X ⫫ Y | 𝐙
.
Description
The null hypothesis is H₀ := ρ(X, Y | 𝐙) = 0
. We use the approach in Levy & Narula (1978)(Levy and Narula, 1978) and compute the Z-transformation of the observed (partial) correlation coefficient $\hat{\rho}_{XY|\bf{Z}}$:
\[Z(\hat{\rho}_{XY|\bf{Z}}) = \log\dfrac{1 + \hat{\rho}_{XY|\bf{Z}}}{1 - \hat{\rho}_{XY|\bf{Z}}}.\]
To test the null hypothesis against the alternative hypothesis H₁ := ρ(X, Y | 𝐙) > 0
, calculate
\[\hat{Z} = \dfrac{1}{2}\dfrac{Z(\hat{\rho}_{XY|\bf{Z}}) - Z(0)}{\sqrt{1/(n - d - 3)}},\]
and compute the two-sided p-value (Schmidt et al., 2018)
\[p(X, Y | \bf{Z}) = 2(1 - \phi(\sqrt{n - d - 3}Z(\hat{\rho}_{XY|\bf{Z}}))),\]
where $d$ is the dimension of $\bf{Z}$ and $n$ is the number of samples. For the pairwise case, the procedure is identical, but set $\bf{Z} = \emptyset$.
Examples
CausalityTools.CorrTestResult
— TypeCorrTestResult(pvalue, ρ, z)
A simple struct that holds the results of a CorrTest
test: the (partial) correlation coefficient ρ
, Fisher's z
, and pvalue
- the two-sided p-value for the test.