Independence testing

Independence testing API

The independence test API is defined by

SurrogateTest

CausalityTools.SurrogateTestType
SurrogateTest <: IndependenceTest
SurrogateTest(measure, [est];
    nshuffles::Int = 100,
    surrogate = RandomShuffle(),
    rng = Random.default_rng(),
)

A generic (conditional) independence test for assessing whether two variables X and Y are independendent, potentially conditioned on a third variable Z, based on surrogate data.

When used with independence, a SurrogateTestResult is returned.

Description

This is a generic one-sided hypothesis test that checks whether x and y are independent (given z, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between the input variables.

There are different ways of shuffling, dictated by surrogate, each representing a distinct null hypothesis. For each shuffle, the provided measure is computed (using est, if relevant). This procedure is repeated nshuffles times, and a test summary is returned. The shuffled variable is always the first variable (X). Exceptions are:

  • If TransferEntropy measure such as TEShannon, then the source variable is always shuffled, and the target and conditional variable are left unshuffled.

Compatible measures

MeasurePairwiseConditionalRequires est
PearsonCorrelationNo
DistanceCorrelationNo
SMeasureNo
HMeasureNo
MMeasureNo
LMeasureNo
PairwiseAsymmetricInferenceYes
ConvergentCrossMappingYes
MIShannonYes
MIRenyiJizbaYes
MIRenyiSarbuYes
MITsallisMartinYes
MITsallisFuruichiYes
PartialCorrelationYes
CMIShannonYes
CMIRenyiJizbaYes
TEShannonYes
TERenyiJizbaYes
PMIYes

Examples

source
CausalityTools.SurrogateTestResultType
SurrogateTestResult(m, m_surr, pvalue)

Holds the result of a SurrogateTest. m is the measure computed on the original data. m_surr is a vector of the measure computed on permuted data, where m_surr[i] is the measure compute on the i-th permutation. pvalue is the one-sided p-value for the test.

source

LocalPermutationTest

CausalityTools.LocalPermutationTestType
LocalPermutationTest <: IndependenceTest
LocalPermutationTest(measure, [est];
    kperm::Int = 5,
    nshuffles::Int = 100,
    rng = Random.default_rng(),
    replace = true,
    w::Int = 0)

LocalPermutationTest is a generic conditional independence test (Runge, 2018) for assessing whether two variables X and Y are conditionally independendent given a third variable Z (all of which may be multivariate).

When used with independence, a LocalPermutationTestResult is returned.

Description

This is a generic one-sided hypothesis test that checks whether X and Y are independent (given Z, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between x and y, but preserve dependencies between x and z.

The algorithm is as follows:

  1. Compute the original conditional independence statistic I(X; Y | Z).
  2. Allocate a scalar valued vector with space for nshuffles elements.
  3. For k ∈ [1, 2, …, nshuffles], repeat
    • For each zᵢ ∈ Y, let nᵢ be time indices of the kperm nearest neighbors of zᵢ, excluding the w nearest neighbors of zᵢ from the neighbor query (i.e w is the Theiler window).
    • Let xᵢ⋆ = X[j], where j is randomly sampled from nᵢ with replacement. This way, xᵢ is replaced with xⱼ only if zᵢ ≈ zⱼ (zᵢ and zⱼ are close). Repeat for i = 1, 2, …, n and obtain the shuffled X̂ = [x̂₁, x̂₂, …, x̂ₙ].
    • Compute the conditional independence statistic Iₖ(X̂; Y | Z).
    • Let Î[k] = Iₖ(X̂; Y | Z).
  4. Compute the p-value as count(Î[k] .<= I) / nshuffles).

In additional to the conditional variant from Runge (2018), we also provide a pairwise version, where the shuffling procedure is identical, except neighbors in Y are used instead of Z and we I(X; Y) and Iₖ(X̂; Y) instead of I(X; Y | Z) and Iₖ(X̂; Y | Z).

Compatible measures

MeasurePairwiseConditionalRequires est
PartialCorrelationNo
DistanceCorrelationNo
CMIShannonYes
TEShannonYes
PMIYes

The LocalPermutationTest is only defined for conditional independence testing. Exceptions are for measures like TEShannon, which use conditional measures under the hood even for their pairwise variants, and are therefore compatible with LocalPermutationTest.

The nearest-neighbor approach in Runge (2018) can be reproduced by using the CMIShannon measure with the FPVP estimator.

Examples

source
CausalityTools.LocalPermutationTestResultType
LocalPermutationTestResult(m, m_surr, pvalue)

Holds the result of a LocalPermutationTest. m is the measure computed on the original data. m_surr is a vector of the measure computed on permuted data, where m_surr[i] is the measure compute on the i-th permutation. pvalue is the one-sided p-value for the test.

source

JointDistanceDistributionTest

CausalityTools.JointDistanceDistributionTestType
JointDistanceDistributionTest <: IndependenceTest
JointDistanceDistributionTest(measure::JointDistanceDistribution; rng = Random.default_rng())

An independence test for two variables based on the JointDistanceDistribution (Amigó and Hirata, 2018).

When used with independence, a JDDTestResult is returned.

Description

The joint distance distribution (labelled Δ in their paper) is used by Amigó & Hirata (2018) to detect directional couplings of the form $X \to Y$ or $Y \to X$. JointDistanceDistributionTest formulates their method as an independence test.

Formally, we test the hypothesis $H_0$ (the variables are independent) against $H_1$ (there is directional coupling between the variables). To do so, we use a right-sided/upper-tailed t-test to check mean of Δ is skewed towards positive value, i.e.

  • $H_0 := \mu(\Delta) = 0$
  • $H_1 := \mu(\Delta) > 0$.

When used with independence, a JDDTestResult is returned, which contains the joint distance distribution and a p-value. If you only need Δ, use jdd directly.

Examples

This example shows how the JointDistanceDistributionTest can be used in practice.

source

CorrTest

CausalityTools.CorrTestType
CorrTest <: IndependenceTest
CorrTest()

An independence test based correlation (for two variables) and partial correlation (for three variables) (Levy and Narula (1978)@; as described in Schmidt et al. (2018)).

Uses PearsonCorrelation and PartialCorrelation internally.

Assumes that the input data are (multivariate) normally distributed. Then ρ(X, Y) = 0 implies X ⫫ Y and ρ(X, Y | 𝐙) = 0 implies X ⫫ Y | 𝐙.

Description

The null hypothesis is H₀ := ρ(X, Y | 𝐙) = 0. We use the approach in Levy & Narula (1978)(Levy and Narula, 1978) and compute the Z-transformation of the observed (partial) correlation coefficient $\hat{\rho}_{XY|\bf{Z}}$:

\[Z(\hat{\rho}_{XY|\bf{Z}}) = \log\dfrac{1 + \hat{\rho}_{XY|\bf{Z}}}{1 - \hat{\rho}_{XY|\bf{Z}}}.\]

To test the null hypothesis against the alternative hypothesis H₁ := ρ(X, Y | 𝐙) > 0, calculate

\[\hat{Z} = \dfrac{1}{2}\dfrac{Z(\hat{\rho}_{XY|\bf{Z}}) - Z(0)}{\sqrt{1/(n - d - 3)}},\]

and compute the two-sided p-value (Schmidt et al., 2018)

\[p(X, Y | \bf{Z}) = 2(1 - \phi(\sqrt{n - d - 3}Z(\hat{\rho}_{XY|\bf{Z}}))),\]

where $d$ is the dimension of $\bf{Z}$ and $n$ is the number of samples. For the pairwise case, the procedure is identical, but set $\bf{Z} = \emptyset$.

Examples

source
CausalityTools.CorrTestResultType
CorrTestResult(pvalue, ρ, z)

A simple struct that holds the results of a CorrTest test: the (partial) correlation coefficient ρ, Fisher's z, and pvalue - the two-sided p-value for the test.

source