Independence testing

For practical applications, it is often useful to determine whether variables are independent, possible conditioned upon another set of variables. One way of doing so is to utilize an association measure, and perform some sort of randomization-based independence testing.

For example, to test the dependence between time series, time series surrogates testing is used. Many other frameworks for independence exist too. Here, we've collected some independence testing frameworks, and made sure that they are compatible with as many of the implemented association measures as possible.

Independence testing API

The independence test API is defined by

independence
IndependenceTest

Associations.independence — Function

independence(test::IndependenceTest, x, y, [z]) → summary

Perform the given IndependenceTest test on data x, y and z. If only x and y are given, test must provide a bivariate association measure. If z is given too, then test must provide a conditional association measure.

Returns a test summary, whose type depends on test.

See IndependenceTest for a list of compatible tests.

source

Associations.IndependenceTest — Type

IndependenceTest <: IndependenceTest

The supertype for all independence tests.

Concrete implementations

SurrogateAssociationTest
LocalPermutationTest
JointDistanceDistributionTest
CorrTest
SECMITest

source

`SurrogateAssociationTest`

Associations.SurrogateAssociationTest — Type

SurrogateAssociationTest <: IndependenceTest
SurrogateAssociationTest(est_or_measure;
    nshuffles::Int = 100,
    surrogate = RandomShuffle(),
    rng = Random.default_rng(),
    show_progress = false,
)

A surrogate-data based generic (conditional) independence test for assessing whether the association between variables X and Y are independent, potentially conditioned on a third variable Z.

Compatible estimators and measures

Compatible with AssociationMeasures that measure some sort of pairwise or conditional association.

Note

You must yourself determine whether using a particular measure is meaningful, and what it means.

Note

If used with a TransferEntropy measure such as TEShannon, then the source variable is always shuffled, and the target and conditional variable are left unshuffled.

Usage

Use with independence to perform a surrogate test with input data. This will return a SurrogateAssociationTestResult.

Description

This is a generic one-sided hypothesis test that checks whether x and y are independent (given z, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between the input variables.

The test first estimates the desired statistic using est_or_measure on the input data. Then, the first input variable is shuffled nshuffled times according to the given surrogate method (each type of surrogate represents a distinct null hypothesis). For each shuffle, est_or_measure is recomputed and the results are stored.

Examples

Example 1: SMeasure test for pairwise independence.
Example 2: DistanceCorrelation test for pairwise independence.
Example 3: PartialCorrelation test for conditional independence.
Example 4: MIShannon test for pairwise independence on categorical data.
Example 5: CMIShannon test for conditional independence on categorical data.
Example 6: MCR test for pairwise and conditional independence.
Example 7. ChatterjeeCorrelation test for pairwise independence.
Example 8. AzadkiaChatterjeeCoefficient test for pairwise and conditional independence.

source

Associations.SurrogateAssociationTestResult — Type

SurrogateAssociationTestResult(m, m_surr, pvalue)

Holds the result of a SurrogateAssociationTest. m is the measure computed on the original data. m_surr is a vector of the measure computed on permuted data, where m_surr[i] is the measure compute on the i-th permutation. pvalue is the one-sided p-value for the test.

source

`LocalPermutationTest`

Associations.LocalPermutationTest — Type

LocalPermutationTest <: IndependenceTest
LocalPermutationTest(measure, [est];
    kperm::Int = 5,
    nshuffles::Int = 100,
    rng = Random.default_rng(),
    replace = true,
    w::Int = 0,
    show_progress = false)

LocalPermutationTest is a generic conditional independence test (Runge, 09–11 Apr 2018) for assessing whether two variables X and Y are conditionally independendent given a third variable Z (all of which may be multivariate).

When used with independence, a LocalPermutationTestResult is returned.

Description

This is a generic one-sided hypothesis test that checks whether X and Y are independent (given Z, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between x and y, but preserve dependencies between x and z.

The algorithm is as follows:

Compute the original conditional independence statistic I(X; Y | Z).
Allocate a scalar valued vector Î with space for nshuffles elements.
For k ∈ [1, 2, …, nshuffles], repeat
- For each zᵢ ∈ Y, let nᵢ be time indices of the kperm nearest neighbors of zᵢ, excluding the w nearest neighbors of zᵢ from the neighbor query (i.e w is the Theiler window).
- Let xᵢ⋆ = X[j], where j is randomly sampled from nᵢ with replacement. This way, xᵢ is replaced with xⱼ only if zᵢ ≈ zⱼ (zᵢ and zⱼ are close). Repeat for i = 1, 2, …, n and obtain the shuffled X̂ = [x̂₁, x̂₂, …, x̂ₙ].
- Compute the conditional independence statistic Iₖ(X̂; Y | Z).
- Let Î[k] = Iₖ(X̂; Y | Z).
Compute the p-value as count(Î[k] .<= I) / nshuffles).

In additional to the conditional variant from Runge (2018), we also provide a pairwise version, where the shuffling procedure is identical, except neighbors in Y are used instead of Z and we I(X; Y) and Iₖ(X̂; Y) instead of I(X; Y | Z) and Iₖ(X̂; Y | Z).

Compatible measures

Measure	Pairwise	Conditional	Requires `est`	Note
`PartialCorrelation`	✖	✓	No
`DistanceCorrelation`	✖	✓	No
`CMIShannon`	✖	✓	Yes
`TEShannon`	✓	✓	Yes	Pairwise tests not possible with `TransferEntropyEstimator`s, only lower-level estimators, e.g. `FPVP`, `GaussianMI` or `Kraskov`
`PartialMutualInformation`	✖	✓	Yes
`AzadkiaChatterjeeCoefficient`	✖	✓	No

The LocalPermutationTest is only defined for conditional independence testing. Exceptions are for measures like TEShannon, which use conditional measures under the hood even for their pairwise variants, and are therefore compatible with LocalPermutationTest.

The nearest-neighbor approach in Runge (2018) can be reproduced by using the CMIShannon measure with the FPVP estimator.

Examples

Example 1: Conditional independence test using CMIShannon
Example 2): Conditional independence test using TEShannon
Example 3: Conditional independence test using AzadkiaChatterjeeCoefficient

source

Associations.LocalPermutationTestResult — Type

LocalPermutationTestResult(m, m_surr, pvalue)

Holds the result of a LocalPermutationTest. m is the measure computed on the original data. m_surr is a vector of the measure computed on permuted data, where m_surr[i] is the measure compute on the i-th permutation. pvalue is the one-sided p-value for the test.

source

`JointDistanceDistributionTest`

Associations.JointDistanceDistributionTest — Type

JointDistanceDistributionTest <: IndependenceTest
JointDistanceDistributionTest(measure::JointDistanceDistribution; rng = Random.default_rng())

An independence test for two variables based on the JointDistanceDistribution (Amigó and Hirata, 2018).

When used with independence, a JDDTestResult is returned.

Description

The joint distance distribution (labelled Δ in their paper) is used by Amigó & Hirata (2018) to detect directional couplings of the form $X \to Y$ or $Y \to X$. JointDistanceDistributionTest formulates their method as an independence test.

Formally, we test the hypothesis $H_0$ (the variables are independent) against $H_1$ (there is directional coupling between the variables). To do so, we use a right-sided/upper-tailed t-test to check mean of Δ is skewed towards positive value, i.e.

$H_0 := \mu(\Delta) = 0$
$H_1 := \mu(\Delta) > 0$.

When used with independence, a JDDTestResult is returned, which contains the joint distance distribution and a p-value. If you only need Δ, use association with a JointDistanceDistribution instance directly.

Examples

Example 1. Detecting (in)dependence in bidirectionally coupled logistic maps.

source

Associations.JDDTestResult — Type

JDDTestResult(Δjdd, hypothetical_μ, pvalue)

Holds the results of JointDistanceDistributionTest. Δjdd is the Δ-distribution, hypothetical_μ is the hypothetical mean of the Δ-distribution under the null, and pvalue is the p-value for the one-sided t-test.

source

`CorrTest`

Associations.CorrTest — Type

CorrTest <: IndependenceTest
CorrTest()

An independence test based correlation (for two variables) and partial correlation (for three variables) (Levy and Narula, 1978); as described in Schmidt et al. (2018).

Uses PearsonCorrelation and PartialCorrelation internally.

Assumes that the input data are (multivariate) normally distributed. Then ρ(X, Y) = 0 implies X ⫫ Y and ρ(X, Y | 𝐙) = 0 implies X ⫫ Y | 𝐙.

Description

The null hypothesis is H₀ := ρ(X, Y | 𝐙) = 0. We use the approach in Levy & Narula (1978)(Levy and Narula, 1978) and compute the Z-transformation of the observed (partial) correlation coefficient $\hat{\rho}_{XY|\bf{Z}}$:

\[Z(\hat{\rho}_{XY|\bf{Z}}) = \log\dfrac{1 + \hat{\rho}_{XY|\bf{Z}}}{1 - \hat{\rho}_{XY|\bf{Z}}}.\]

To test the null hypothesis against the alternative hypothesis H₁ := ρ(X, Y | 𝐙) > 0, calculate

\[\hat{Z} = \dfrac{1}{2}\dfrac{Z(\hat{\rho}_{XY|\bf{Z}}) - Z(0)}{\sqrt{1/(n - d - 3)}},\]

and compute the two-sided p-value (Schmidt et al., 2018)

\[p(X, Y | \bf{Z}) = 2(1 - \phi(\sqrt{n - d - 3}Z(\hat{\rho}_{XY|\bf{Z}}))),\]

where $d$ is the dimension of $\bf{Z}$ and $n$ is the number of samples. For the pairwise case, the procedure is identical, but set $\bf{Z} = \emptyset$.

Examples

Example 1. Pairwise and conditional tests for independence on coupled noise processes.

source

Associations.CorrTestResult — Type

CorrTestResult(pvalue, ρ, z)

A simple struct that holds the results of a CorrTest test: the (partial) correlation coefficient ρ, Fisher's z, and pvalue - the two-sided p-value for the test.

source

`SECMITest`

Associations.SECMITest — Type

SECMITest <: IndependenceTest
SECMITest(est; nshuffles = 19, surrogate = RandomShuffle(), rng = Random.default_rng())

A test for conditional independence based on the ShortExpansionConditionalMutualInformation measure (Kubkowski et al., 2021).

The first argument est must be a InformationMeasureEstimator that provides the ShortExpansionConditionalMutualInformation instance. See examples below.

Examples

Example 1: Independence test for small sample sizes using CodifyVariables with ValueBinning discretization.
Example 2: Independence test for small sample sizes with categorical data (using CodifyVariables with UniqueElements discretization).

source

Associations.SECMITestResult — Type

SECMITestResult <: IndependenceTestResult
SECMITestResult(secmi₀, secmiₖ, p, μ̂, σ̂, emp_cdf, D𝒩, D𝒳², nshuffles::Int)

A simple struct that holds the computed parameters of a SECMITest when called with independence, as described in (Kubkowski et al., 2021).

Parameters

p: The p-value for the test.
secmi₀: The value of the ShortExpansionConditionalMutualInformation measure estimated on the original data.
secmiₖ: An ensemble of values for the ShortExpansionConditionalMutualInformation measure estimated on triples SECMI(X̂, Y, Z), where X̂ indicates a shuffled version of the first variable X and length(secmiₖ) == nshuffles.
μ̂: The estimated mean of the secmiₖ.
σ̂: The estimated standard deviation of the secmiₖ.
emp_cdf: The empirical cumulative distribution function (CDF) of the secmiₖs.
D𝒩: The $D_{N(\hat{\mu}, \hat{\sigma})}$ statistic.
D𝒳²: The $D_{\chi^2}$ statistic.

source