Independence testing
For practical applications, it is often useful to determine whether variables are independent, possible conditioned upon another set of variables. One way of doing so is to utilize an association measure, and perform some sort of randomization-based independence testing.
For example, to test the dependence between time series, time series surrogates testing is used. Many other frameworks for independence exist too. Here, we've collected some independence testing frameworks, and made sure that they are compatible with as many of the implemented association measures as possible.
Independence testing API
The independence test API is defined by
Associations.independence
— Functionindependence(test::IndependenceTest, x, y, [z]) → summary
Perform the given IndependenceTest
test
on data x
, y
and z
. If only x
and y
are given, test
must provide a bivariate association measure. If z
is given too, then test
must provide a conditional association measure.
Returns a test summary
, whose type depends on test
.
See IndependenceTest
for a list of compatible tests.
Associations.IndependenceTest
— TypeIndependenceTest <: IndependenceTest
The supertype for all independence tests.
Concrete implementations
SurrogateAssociationTest
Associations.SurrogateAssociationTest
— TypeSurrogateAssociationTest <: IndependenceTest
SurrogateAssociationTest(est_or_measure;
nshuffles::Int = 100,
surrogate = RandomShuffle(),
rng = Random.default_rng(),
show_progress = false,
)
A surrogate-data based generic (conditional) independence test for assessing whether the association between variables X
and Y
are independent, potentially conditioned on a third variable Z
.
Compatible estimators and measures
- Compatible with
AssociationMeasure
s that measure some sort of pairwise or conditional association.
You must yourself determine whether using a particular measure is meaningful, and what it means.
If used with a TransferEntropy
measure such as TEShannon
, then the source variable is always shuffled, and the target and conditional variable are left unshuffled.
Usage
- Use with
independence
to perform a surrogate test with input data. This will return aSurrogateAssociationTestResult
.
Description
This is a generic one-sided hypothesis test that checks whether x
and y
are independent (given z
, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between the input variables.
The test first estimates the desired statistic using est_or_measure
on the input data. Then, the first input variable is shuffled nshuffled
times according to the given surrogate
method (each type of surrogate
represents a distinct null hypothesis). For each shuffle, est_or_measure
is recomputed and the results are stored.
Examples
- Example 1:
SMeasure
test for pairwise independence. - Example 2:
DistanceCorrelation
test for pairwise independence. - Example 3:
PartialCorrelation
test for conditional independence. - Example 4:
MIShannon
test for pairwise independence on categorical data. - Example 5:
CMIShannon
test for conditional independence on categorical data. - Example 6:
MCR
test for pairwise and conditional independence. - Example 7.
ChatterjeeCorrelation
test for pairwise independence. - Example 8.
AzadkiaChatterjeeCoefficient
test for pairwise and conditional independence.
Associations.SurrogateAssociationTestResult
— TypeSurrogateAssociationTestResult(m, m_surr, pvalue)
Holds the result of a SurrogateAssociationTest
. m
is the measure computed on the original data. m_surr
is a vector of the measure computed on permuted data, where m_surr[i]
is the measure compute on the i
-th permutation. pvalue
is the one-sided p
-value for the test.
LocalPermutationTest
Associations.LocalPermutationTest
— TypeLocalPermutationTest <: IndependenceTest
LocalPermutationTest(measure, [est];
kperm::Int = 5,
nshuffles::Int = 100,
rng = Random.default_rng(),
replace = true,
w::Int = 0,
show_progress = false)
LocalPermutationTest
is a generic conditional independence test (Runge, 09–11 Apr 2018) for assessing whether two variables X
and Y
are conditionally independendent given a third variable Z
(all of which may be multivariate).
When used with independence
, a LocalPermutationTestResult
is returned.
Description
This is a generic one-sided hypothesis test that checks whether X
and Y
are independent (given Z
, if provided) based on resampling from a null distribution assumed to represent independence between the variables. The null distribution is generated by repeatedly shuffling the input data in some way that is intended to break any dependence between x
and y
, but preserve dependencies between x
and z
.
The algorithm is as follows:
- Compute the original conditional independence statistic
I(X; Y | Z)
. - Allocate a scalar valued vector
Î
with space fornshuffles
elements. - For
k ∈ [1, 2, …, nshuffles]
, repeat- For each
zᵢ ∈ Y
, letnᵢ
be time indices of thekperm
nearest neighbors ofzᵢ
, excluding thew
nearest neighbors ofzᵢ
from the neighbor query (i.ew
is the Theiler window). - Let
xᵢ⋆ = X[j]
, wherej
is randomly sampled fromnᵢ
with replacement. This way,xᵢ
is replaced withxⱼ
only ifzᵢ ≈ zⱼ
(zᵢ
andzⱼ
are close). Repeat fori = 1, 2, …, n
and obtain the shuffledX̂ = [x̂₁, x̂₂, …, x̂ₙ]
. - Compute the conditional independence statistic
Iₖ(X̂; Y | Z)
. - Let
Î[k] = Iₖ(X̂; Y | Z)
.
- For each
- Compute the p-value as
count(Î[k] .<= I) / nshuffles)
.
In additional to the conditional variant from Runge (2018), we also provide a pairwise version, where the shuffling procedure is identical, except neighbors in Y
are used instead of Z
and we I(X; Y)
and Iₖ(X̂; Y)
instead of I(X; Y | Z)
and Iₖ(X̂; Y | Z)
.
Compatible measures
Measure | Pairwise | Conditional | Requires est | Note |
---|---|---|---|---|
PartialCorrelation | ✖ | ✓ | No | |
DistanceCorrelation | ✖ | ✓ | No | |
CMIShannon | ✖ | ✓ | Yes | |
TEShannon | ✓ | ✓ | Yes | Pairwise tests not possible with TransferEntropyEstimator s, only lower-level estimators, e.g. FPVP , GaussianMI or Kraskov |
PartialMutualInformation | ✖ | ✓ | Yes | |
AzadkiaChatterjeeCoefficient | ✖ | ✓ | No |
The LocalPermutationTest
is only defined for conditional independence testing. Exceptions are for measures like TEShannon
, which use conditional measures under the hood even for their pairwise variants, and are therefore compatible with LocalPermutationTest
.
The nearest-neighbor approach in Runge (2018) can be reproduced by using the CMIShannon
measure with the FPVP
estimator.
Examples
- Example 1: Conditional independence test using
CMIShannon
- Example 2): Conditional independence test using
TEShannon
- Example 3: Conditional independence test using
AzadkiaChatterjeeCoefficient
Associations.LocalPermutationTestResult
— TypeLocalPermutationTestResult(m, m_surr, pvalue)
Holds the result of a LocalPermutationTest
. m
is the measure computed on the original data. m_surr
is a vector of the measure computed on permuted data, where m_surr[i]
is the measure compute on the i
-th permutation. pvalue
is the one-sided p
-value for the test.
JointDistanceDistributionTest
Associations.JointDistanceDistributionTest
— TypeJointDistanceDistributionTest <: IndependenceTest
JointDistanceDistributionTest(measure::JointDistanceDistribution; rng = Random.default_rng())
An independence test for two variables based on the JointDistanceDistribution
(Amigó and Hirata, 2018).
When used with independence
, a JDDTestResult
is returned.
Description
The joint distance distribution (labelled Δ
in their paper) is used by Amigó & Hirata (2018) to detect directional couplings of the form $X \to Y$ or $Y \to X$. JointDistanceDistributionTest
formulates their method as an independence test.
Formally, we test the hypothesis $H_0$ (the variables are independent) against $H_1$ (there is directional coupling between the variables). To do so, we use a right-sided/upper-tailed t-test to check mean of Δ
is skewed towards positive value, i.e.
- $H_0 := \mu(\Delta) = 0$
- $H_1 := \mu(\Delta) > 0$.
When used with independence
, a JDDTestResult
is returned, which contains the joint distance distribution and a p-value. If you only need Δ
, use association
with a JointDistanceDistribution
instance directly.
Examples
- Example 1. Detecting (in)dependence in bidirectionally coupled logistic maps.
Associations.JDDTestResult
— TypeJDDTestResult(Δjdd, hypothetical_μ, pvalue)
Holds the results of JointDistanceDistributionTest
. Δjdd
is the Δ
-distribution, hypothetical_μ
is the hypothetical mean of the Δ
-distribution under the null, and pvalue
is the p-value for the one-sided t-test.
CorrTest
Associations.CorrTest
— TypeCorrTest <: IndependenceTest
CorrTest()
An independence test based correlation (for two variables) and partial correlation (for three variables) (Levy and Narula, 1978); as described in Schmidt et al. (2018).
Uses PearsonCorrelation
and PartialCorrelation
internally.
Assumes that the input data are (multivariate) normally distributed. Then ρ(X, Y) = 0
implies X ⫫ Y
and ρ(X, Y | 𝐙) = 0
implies X ⫫ Y | 𝐙
.
Description
The null hypothesis is H₀ := ρ(X, Y | 𝐙) = 0
. We use the approach in Levy & Narula (1978)(Levy and Narula, 1978) and compute the Z-transformation of the observed (partial) correlation coefficient $\hat{\rho}_{XY|\bf{Z}}$:
\[Z(\hat{\rho}_{XY|\bf{Z}}) = \log\dfrac{1 + \hat{\rho}_{XY|\bf{Z}}}{1 - \hat{\rho}_{XY|\bf{Z}}}.\]
To test the null hypothesis against the alternative hypothesis H₁ := ρ(X, Y | 𝐙) > 0
, calculate
\[\hat{Z} = \dfrac{1}{2}\dfrac{Z(\hat{\rho}_{XY|\bf{Z}}) - Z(0)}{\sqrt{1/(n - d - 3)}},\]
and compute the two-sided p-value (Schmidt et al., 2018)
\[p(X, Y | \bf{Z}) = 2(1 - \phi(\sqrt{n - d - 3}Z(\hat{\rho}_{XY|\bf{Z}}))),\]
where $d$ is the dimension of $\bf{Z}$ and $n$ is the number of samples. For the pairwise case, the procedure is identical, but set $\bf{Z} = \emptyset$.
Examples
- Example 1. Pairwise and conditional tests for independence on coupled noise processes.
Associations.CorrTestResult
— TypeCorrTestResult(pvalue, ρ, z)
A simple struct that holds the results of a CorrTest
test: the (partial) correlation coefficient ρ
, Fisher's z
, and pvalue
- the two-sided p-value for the test.
SECMITest
Associations.SECMITest
— TypeSECMITest <: IndependenceTest
SECMITest(est; nshuffles = 19, surrogate = RandomShuffle(), rng = Random.default_rng())
A test for conditional independence based on the ShortExpansionConditionalMutualInformation
measure (Kubkowski et al., 2021).
The first argument est
must be a InformationMeasureEstimator
that provides the ShortExpansionConditionalMutualInformation
instance. See examples below.
Examples
- Example 1: Independence test for small sample sizes using
CodifyVariables
withValueBinning
discretization. - Example 2: Independence test for small sample sizes with categorical data (using
CodifyVariables
withUniqueElements
discretization).
Associations.SECMITestResult
— TypeSECMITestResult <: IndependenceTestResult
SECMITestResult(secmi₀, secmiₖ, p, μ̂, σ̂, emp_cdf, D𝒩, D𝒳², nshuffles::Int)
A simple struct that holds the computed parameters of a SECMITest
when called with independence
, as described in (Kubkowski et al., 2021).
Parameters
p
: The p-value for the test.secmi₀
: The value of theShortExpansionConditionalMutualInformation
measure estimated on the original data.secmiₖ
: An ensemble of values for theShortExpansionConditionalMutualInformation
measure estimated on triplesSECMI(X̂, Y, Z)
, whereX̂
indicates a shuffled version of the first variableX
andlength(secmiₖ) == nshuffles
.μ̂
: The estimated mean of thesecmiₖ
.σ̂
: The estimated standard deviation of thesecmiₖ
.emp_cdf
: The empirical cumulative distribution function (CDF) of thesecmiₖ
s.D𝒩
: The $D_{N(\hat{\mu}, \hat{\sigma})}$ statistic.D𝒳²
: The $D_{\chi^2}$ statistic.