Independence testing
JointDistanceDistributionTest
Bidirectionally coupled logistic maps
Let's use the built-in logistic2_bidir
discrete dynamical system to create a pair of bidirectionally coupled time series and use the JointDistanceDistributionTest
to see if we can confirm from observed time series that these variables are bidirectionally coupled. We'll use a significance level of 1 - α = 0.99
, i.e. α = 0.01
.
We start by generating some time series and configuring the test.
using CausalityTools
sys = system(Logistic2Bidir(c_xy = 0.5, c_yx = 0.4))
x, y = columns(first(trajectory(sys, 2000, Ttr = 10000)))
measure = JointDistanceDistribution(D = 5, B = 5)
test = JointDistanceDistributionTest(measure)
JointDistanceDistributionTest{JointDistanceDistribution{Euclidean, Float64}, Random.TaskLocalRNG}(JointDistanceDistribution{Euclidean, Float64}(Euclidean(0.0), 5, 5, -1, 0.0), Random.TaskLocalRNG())
Now, we test for independence in both directions.
independence(test, x, y)
`JointDistanceDistributionTest` independence test
----------------------------------------------------------------------------------
H₀: μ(Δ) = 0 (the input variables are independent)
H₁: μ(Δ) > 0 (there is directional dependence between the input variables)
----------------------------------------------------------------------------------
Hypothetical μ(Δ): 0.0
Observed μ(Δ): 0.4709078138991162
p-value: 1.6810181864279627e-5
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
independence(test, y, x)
`JointDistanceDistributionTest` independence test
----------------------------------------------------------------------------------
H₀: μ(Δ) = 0 (the input variables are independent)
H₁: μ(Δ) > 0 (there is directional dependence between the input variables)
----------------------------------------------------------------------------------
Hypothetical μ(Δ): 0.0
Observed μ(Δ): 0.4652435338034594
p-value: 2.69017598871768e-5
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
As expected, the null hypothesis is rejected in both directions at the pre-determined significance level, and hence we detect directional coupling in both directions.
Non-coupled logistic maps
What happens in the example above if there is no coupling?
sys = system(Logistic2Bidir(c_xy = 0.00, c_yx = 0.0))
x, y = columns(first(trajectory(sys, 1000, Ttr = 10000)));
rxy = independence(test, x, y)
ryx = independence(test, y, x)
pvalue(rxy), pvalue(ryx)
(0.0686678284769523, 0.10720581302326959)
At significance level 0.99
, we can't reject the null in either direction, hence there's not enough evidence in the data to suggest directional coupling.
LocalPermutationTest
Conditional mutual information (Shannon, differential)
Chain of random variables $X \to Y \to Z$
Here, we'll create a three-variable scenario where X
and Z
are connected through Y
, so that $I(X; Z | Y) = 0$ and $I(X; Y | Z) > 0$. We'll test for conditional independence using Shannon conditional mutual information (CMIShannon
). To estimate CMI, we'll use the Kraskov
differential entropy estimator, which naively computes CMI as a sum of entropy terms without guaranteed bias cancellation.
using CausalityTools
X = randn(1000)
Y = X .+ randn(1000) .* 0.4
Z = randn(1000) .+ Y
x, y, z = StateSpaceSet.((X, Y, Z))
test = LocalPermutationTest(CMIShannon(base = 2), Kraskov(k = 10), nshuffles = 30)
test_result = independence(test, x, y, z)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: 0.9213969421996389
Ensemble quantiles (30 permutations):
(99.9%): -0.08648647147323632
(99%): -0.08653371882228578
(95%): -0.08756963820227828
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
We expect there to be a detectable influence from $X$ to $Y$, if we condition on $Z$ or not, because $Z$ doesn't influence neither $X$ nor $Y$. The null hypothesis is that the first two variables are conditionally independent given the third, which we reject with a very low p-value. Hence, we accept the alternative hypothesis that the first two variables $X$ and $Y$. are conditionally dependent given $Z$.
test_result = independence(test, x, z, y)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.09728118529104712
Ensemble quantiles (30 permutations):
(99.9%): -0.11997086678126215
(99%): -0.12221744449848704
(95%): -0.1385539114467165
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
As expected, we cannot reject the null hypothesis that $X$ and $Z$ are conditionally independent given $Y$, because $Y$ is the variable that transmits information from $X$ to $Z$.
Transfer entropy (Shannon, differential)
Chain of random variables $X \to Y \to Z to W$
Here, we demonstrate LocalPermutationTest
with the TEShannon
measure with default parameters and the FPVP
estimator. We'll use a system of four coupled logistic maps that are linked X → Y → Z → W
.
using CausalityTools
using Random; rng = Random.default_rng()
s = system(Logistic4Chain(; xi = rand(4)))
x, y, z, w = columns(first(trajectory(s, 2000)))
test = LocalPermutationTest(TEShannon(), FPVP(), nshuffles = 50)
test_result = independence(test, x, z)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent"
Hₐ: "The first two variables are dependent"
---------------------------------------------------------------------
Estimated: 0.10570729402665664
Ensemble quantiles (50 permutations):
(99.9%): -0.4454602991089679
(99%): -0.44825828166289167
(95%): -0.4590188781970154
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
There is significant transfer entropy from X → Z
. We should expect this transfer entropy to be non-significant when conditioning on Y
, because all information from X
to Z
is transferred through Y
.
test_result = independence(test, x, z, y)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -1.2198764008727012
Ensemble quantiles (50 permutations):
(99.9%): -1.1421298600618275
(99%): -1.1437674667616895
(95%): -1.1491416893659891
p-value: 1.0
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, we cannot reject the null hypothesis that X
and Z
are conditionally independent given Y
.
The same goes for variables one step up the chain
test_result = independence(test, y, w, z)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -1.2039122940560982
Ensemble quantiles (50 permutations):
(99.9%): -1.112077640126405
(99%): -1.1127495489105437
(95%): -1.115043136902454
p-value: 1.0
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
SurrogateTest
Distance correlation
using CausalityTools
x = randn(1000)
y = randn(1000) .+ 0.5x
independence(SurrogateTest(DistanceCorrelation()), x, y)
`SurrogateTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent"
Hₐ: "The first two variables are dependent"
---------------------------------------------------------------------
Estimated: 0.429178979414558
Ensemble quantiles (100 permutations):
(99.9%): 0.084027076408428
(99%): 0.08277641951651224
(95%): 0.07679698601758908
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
Partial correlation
using CausalityTools
x = randn(1000)
y = randn(1000) .+ 0.5x
z = randn(1000) .+ 0.8y
independence(SurrogateTest(PartialCorrelation()), x, z, y)
`SurrogateTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: 0.005884331955841984
Ensemble quantiles (100 permutations):
(99.9%): 0.06003425644462099
(99%): 0.046554971687715385
(95%): 0.040289761256576566
p-value: 0.38
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
Mutual information (MIShannon
, categorical)
In this example, we expect the preference
and the food
variables to be independent.
using CausalityTools
# Simulate
n = 1000
preference = rand(["yes", "no"], n)
food = rand(["veggies", "meat", "fish"], n)
test = SurrogateTest(MIShannon(), Contingency())
independence(test, preference, food)
`SurrogateTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent"
Hₐ: "The first two variables are dependent"
---------------------------------------------------------------------
Estimated: 0.002287507119021698
Ensemble quantiles (100 permutations):
(99.9%): 0.009450018397440377
(99%): 0.0058292437747054
(95%): 0.00396360134950233
p-value: 0.13
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, there's not enough evidence to reject the null hypothesis that the variables are independent.
Conditional mutual information (CMIShannon
, categorical)
Here, we simulate a survey at a ski resort. The data are such that the place a person grew up is associated with how many times they fell while going skiing. The control happens through an intermediate variable preferred_equipment
, which indicates what type of physical activity the person has engaged with in the past. Some activities like skateboarding leads to better overall balance, so people that are good on a skateboard also don't fall, and people that to less challenging activities fall more often.
We should be able to reject places ⫫ experience
, but not reject places ⫫ experience | preferred_equipment
. Let's see if we can detect these relationships using (conditional) mutual information.
using CausalityTools
n = 10000
places = rand(["city", "countryside", "under a rock"], n);
preferred_equipment = map(places) do place
if cmp(place, "city") == 1
return rand(["skateboard", "bmx bike"])
elseif cmp(place, "countryside") == 1
return rand(["sled", "snowcarpet"])
else
return rand(["private jet", "ferris wheel"])
end
end;
experience = map(preferred_equipment) do equipment
if equipment ∈ ["skateboard", "bmx bike"]
return "didn't fall"
elseif equipment ∈ ["sled", "snowcarpet"]
return "fell 3 times or less"
else
return "fell uncontably many times"
end
end;
test_mi = independence(SurrogateTest(MIShannon(), Contingency()), places, experience)
`SurrogateTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent"
Hₐ: "The first two variables are dependent"
---------------------------------------------------------------------
Estimated: 0.913798558297143
Ensemble quantiles (100 permutations):
(99.9%): 0.0009412682118663583
(99%): 0.0005850271905672326
(95%): 0.0003825112305142701
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
As expected, the evidence favors the alternative hypothesis that places
and experience
are dependent.
test_cmi = independence(SurrogateTest(CMIShannon(), Contingency()), places, experience, preferred_equipment)
`SurrogateTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: 0.0
Ensemble quantiles (100 permutations):
(99.9%): 5.122954902592361e-16
(99%): 4.80613281793856e-16
(95%): 4.64168491836522e-16
p-value: 0.78
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
Again, as expected, when conditioning on the mediating variable, the dependence disappears, and we can't reject the null hypothesis of independence.
Transfer entropy (TEShannon
)
Pairwise
We'll see if we can reject independence for two unidirectionally coupled timeseries where x
drives y
.
using CausalityTools
sys = system(Logistic2Unidir(c_xy = 0.5)) # x affects y, but not the other way around.
x, y = columns(first(trajectory(sys, 1000, Ttr = 10000)))
test = SurrogateTest(TEShannon(), KSG1(k = 4))
independence(test, x, y)
`SurrogateTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent"
Hₐ: "The first two variables are dependent"
---------------------------------------------------------------------
Estimated: 1.3748004482945986
Ensemble quantiles (100 permutations):
(99.9%): -0.3680974059723177
(99%): -0.36849935155962027
(95%): -0.38057142337793703
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
As expected, we can reject the null hypothesis that the future of y
is independent of x
, because x
does actually influence y
. This doesn't change if we compute partial (conditional) transfer entropy with respect to some random extra time series, because it doesn't influence any of the other two variables.
independence(test, x, y, rand(length(x)))
`SurrogateTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: 0.35675703047048013
Ensemble quantiles (100 permutations):
(99.9%): -0.3165451470914358
(99%): -0.31728792533741806
(95%): -0.3228504922607298
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
SMeasure
using CausalityTools
x, y = randn(3000), randn(3000)
measure = SMeasure(dx = 3, dy = 3)
s = s_measure(measure, x, y)
0.010801005496614738
The s
statistic is larger when there is stronger coupling and smaller when there is weaker coupling. To check whether s
is significant (i.e. large enough to claim directional dependence), we can use a SurrogateTest
, like here.
test = SurrogateTest(measure)
independence(test, x, y)
`SurrogateTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent"
Hₐ: "The first two variables are dependent"
---------------------------------------------------------------------
Estimated: 0.010801005496614738
Ensemble quantiles (100 permutations):
(99.9%): 0.010839490873976272
(99%): 0.010835757873726532
(95%): 0.01062396658014453
p-value: 0.02
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
The p-value is high, and we can't reject the null at any reasonable significance level. Hence, there isn't evidence in the data to support directional coupling from x
to y
.
What happens if we use coupled variables?
z = x .+ 0.1y
independence(test, x, z)
`SurrogateTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent"
Hₐ: "The first two variables are dependent"
---------------------------------------------------------------------
Estimated: 0.4802919709473727
Ensemble quantiles (100 permutations):
(99.9%): 0.010823601279271962
(99%): 0.010779027446573896
(95%): 0.010544734534271965
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
Now we can confidently reject the null (independence), and conclude that there is evidence in the data to support directional dependence from x
to z
.
PATest
The following example demonstrates how to compute the significance of the PA
directional dependence measure using a PATest
. We'll use timeseries from a chain of unidirectionally coupled logistic maps that are coupled $X \to Y \to Z \to W$.
Conditional analysis
What happens if we compute$\Delta A_{X \to Z}$? We'd maybe expect there to be some information transfer $X \to Z$, even though the variables are not directly linked, because information is transferred through $Y$.
using CausalityTools
using DelayEmbeddings
using Random
rng = MersenneTwister(1234)
sys = system(Logistic4Chain(xi = [0.1, 0.2, 0.3, 0.4]; rng))
x, y, z, w = columns(first(trajectory(sys, 1000)))
τx = estimate_delay(x, "mi_min")
τy = estimate_delay(y, "mi_min")
test = PATest(PA(ηT = 1:10, τS = estimate_delay(x, "mi_min")), FPVP())
ΔA_xz = independence(test, x, z)
`PATest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent"
Hₐ: "The first two variables are dependent"
---------------------------------------------------------------------
p-value: 0.00020077060192156678
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
As expected, the distribution is still significantly skewed towards positive values. To determine whether the information flow between $x$ and $z$ is mediated by $y$, we can compute the conditional distribution $\Delta A_{X \to Z | Y}$. If these values are still positively skewed, we conclude that $Y$ is not a mediating variable. If conditioning on $Y$ causes $\Delta A_{X \to Z | Y}$ to not be skewed towards positive values any more, then we conclude that $Y$ is a mediating variable and that $X$ and $Z$ are linked $X \to Y \to Z$.
measure = PA(ηT = 1:10, τS = estimate_delay(x, "mi_min"), τC = estimate_delay(y, "mi_min"))
test = PATest(measure, FPVP())
ΔA_xzy = independence(test, x, z, y)
`PATest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
p-value: 0.9999998147993345
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
We can't reject independence when conditioning on $Y$, so we conclude that $Y$ is a variable responsible for transferring information from $X$ to $Z$.
CorrTest
using CausalityTools
using StableRNGs
rng = StableRNG(1234)
# Some normally distributed data
X = randn(rng, 1000)
Y = 0.5*randn(rng, 1000) .+ X
Z = 0.5*randn(rng, 1000) .+ Y
W = randn(rng, 1000);
1000-element Vector{Float64}:
0.04119305811415608
-0.41657367646817706
-1.6761567284949759
0.5032612404277547
-0.5215484260111859
1.9358224993203865
-1.0265887777810425
1.6521804479613016
-0.8465373053795329
0.0005699094766695407
⋮
-1.5898712547487501
-0.7520563510997734
0.7478677609641371
-2.191965552657181
-2.286303058495732
-0.9855303467641845
1.0709616459241131
-0.3575876592835458
0.480455573541223
Let's test a few independence relationships. For example, we expect that X ⫫ W
. We also expect dependence X !⫫ Z
, but this dependence should vanish when conditioning on the intermediate variable, so we expect X ⫫ Z | Y
.
independence(CorrTest(), X, W)
`CorrTest` independence test
----------------------------------------------------------------------------------
H₀: "The first two variables are independent (given the 3rd variable, if relevant)"
H₁: "The first two variables are dependent (given the 3rd variable, if relevant)"
----------------------------------------------------------------------------------
ρ, (partial) correlation: -0.014905921318266182
p-value (two-sided): 0.6378593400241255
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, the outcome is that we can't reject the null hypothesis that X ⫫ W
.
independence(CorrTest(), X, Z)
`CorrTest` independence test
----------------------------------------------------------------------------------
H₀: "The first two variables are independent (given the 3rd variable, if relevant)"
H₁: "The first two variables are dependent (given the 3rd variable, if relevant)"
----------------------------------------------------------------------------------
ρ, (partial) correlation: 0.8081260288756821
p-value (two-sided): 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
However, we can reject the null hypothesis that X ⫫ Z
, so the evidence favors the alternative hypothesis X !⫫ Z
.
independence(CorrTest(), X, Z, Y)
`CorrTest` independence test
----------------------------------------------------------------------------------
H₀: "The first two variables are independent (given the 3rd variable, if relevant)"
H₁: "The first two variables are dependent (given the 3rd variable, if relevant)"
----------------------------------------------------------------------------------
ρ, (partial) correlation: -0.029272091250858005
p-value (two-sided): 0.35544695038151275
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, the correlation between X
and Z
significantly vanishes when conditioning on Y
, because Y
is solely responsible for the observed correlation between X
and Y
.