Examples of independence testing
CorrTest
using Associations
using Random; rng = Xoshiro(1234)
# Some normally distributed data
X = randn(rng, 1000)
Y = 0.5*randn(rng, 1000) .+ X
Z = 0.5*randn(rng, 1000) .+ Y
W = randn(rng, 1000);
1000-element Vector{Float64}:
0.6740170412967725
0.7629987942399918
-2.1339097080425504
0.1591207240795622
1.0412663555321997
-0.9251395576320531
1.3861677761743247
0.1621593830146287
1.5175910382337752
-0.22779737050048576
⋮
-1.447126550264058
0.7643777251484426
-1.1217653063785107
1.4125903864299203
0.5244479758243477
0.5962220799023128
0.049970164703266234
-0.7466327764477875
0.8135396561180497
Let's test a few independence relationships. For example, we expect that X ⫫ W
. We also expect dependence X !⫫ Z
, but this dependence should vanish when conditioning on the intermediate variable, so we expect X ⫫ Z | Y
.
independence(CorrTest(), X, W)
`CorrTest` independence test
----------------------------------------------------------------------------------
H₀: "The first two variables are independent (given the 3rd variable, if relevant)"
H₁: "The first two variables are dependent (given the 3rd variable, if relevant)"
----------------------------------------------------------------------------------
ρ, (partial) correlation: 0.025737972061567187
p-value (two-sided): 0.41629610085016866
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, the outcome is that we can't reject the null hypothesis that X ⫫ W
.
independence(CorrTest(), X, Z)
`CorrTest` independence test
----------------------------------------------------------------------------------
H₀: "The first two variables are independent (given the 3rd variable, if relevant)"
H₁: "The first two variables are dependent (given the 3rd variable, if relevant)"
----------------------------------------------------------------------------------
ρ, (partial) correlation: 0.8334821068202709
p-value (two-sided): 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
However, we can reject the null hypothesis that X ⫫ Z
, so the evidence favors the alternative hypothesis X !⫫ Z
.
independence(CorrTest(), X, Z, Y)
`CorrTest` independence test
----------------------------------------------------------------------------------
H₀: "The first two variables are independent (given the 3rd variable, if relevant)"
H₁: "The first two variables are dependent (given the 3rd variable, if relevant)"
----------------------------------------------------------------------------------
ρ, (partial) correlation: -0.016016159237136465
p-value (two-sided): 0.6132044078198666
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, the correlation between X
and Z
significantly vanishes when conditioning on Y
, because Y
is solely responsible for the observed correlation between X
and Y
.
JointDistanceDistributionTest
Bidirectionally coupled logistic maps
Let's use the built-in logistic2_bidir
discrete dynamical system to create a pair of bidirectionally coupled time series and use the JointDistanceDistributionTest
to see if we can confirm from observed time series that these variables are bidirectionally coupled. We'll use a significance level of 1 - α = 0.99
, i.e. α = 0.01
.
using Associations
using DynamicalSystemsBase
using Random; rng = Xoshiro(1234)
Base.@kwdef struct Logistic2Bidir{V, C1, C2, R1, R2, Σx, Σy, R}
xi::V = [0.5, 0.5]
c_xy::C1 = 0.1
c_yx::C2 = 0.1
r₁::R1 = 3.78
r₂::R2 = 3.66
σ_xy::Σx = 0.05
σ_yx::Σy = 0.05
rng::R = Random.default_rng()
end
function system(definition::Logistic2Bidir)
return DiscreteDynamicalSystem(eom_logistic2bidir, definition.xi, definition)
end
function eom_logistic2bidir(u, p::Logistic2Bidir, t)
(; xi, c_xy, c_yx, r₁, r₂, σ_xy, σ_yx, rng) = p
x, y = u
f_xy = (y + c_xy*(x + σ_xy * rand(rng)) ) / (1 + c_xy*(1+σ_xy))
f_yx = (x + c_yx*(y + σ_yx * rand(rng)) ) / (1 + c_yx*(1+σ_yx))
dx = r₁ * (f_yx) * (1 - f_yx)
dy = r₂ * (f_xy) * (1 - f_xy)
return SVector{2}(dx, dy)
end
eom_logistic2bidir (generic function with 1 method)
We start by generating some time series and configuring the test.
using Associations
sys = system(Logistic2Bidir(c_xy = 0.5, c_yx = 0.4))
x, y = columns(first(trajectory(sys, 2000, Ttr = 10000)))
measure = JointDistanceDistribution(D = 5, B = 5)
test = JointDistanceDistributionTest(measure)
JointDistanceDistributionTest{JointDistanceDistribution{Euclidean, Float64}, Random.TaskLocalRNG}(JointDistanceDistribution{Euclidean, Float64}(Euclidean(0.0), 5, 5, -1, 0.0), Random.TaskLocalRNG())
Now, we test for independence in both directions.
independence(test, x, y)
`JointDistanceDistributionTest` independence test
----------------------------------------------------------------------------------
H₀: μ(Δ) = 0 (the input variables are independent)
H₁: μ(Δ) > 0 (there is directional dependence between the input variables)
----------------------------------------------------------------------------------
Hypothetical μ(Δ): 0.0
Observed μ(Δ): 0.4738777169259148
p-value: 1.5568381346309224e-5
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
independence(test, y, x)
`JointDistanceDistributionTest` independence test
----------------------------------------------------------------------------------
H₀: μ(Δ) = 0 (the input variables are independent)
H₁: μ(Δ) > 0 (there is directional dependence between the input variables)
----------------------------------------------------------------------------------
Hypothetical μ(Δ): 0.0
Observed μ(Δ): 0.47399837863956673
p-value: 8.735373467905205e-6
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
As expected, the null hypothesis is rejected in both directions at the pre-determined significance level, and hence we detect directional coupling in both directions.
Non-coupled logistic maps
What happens in the example above if there is no coupling?
sys = system(Logistic2Bidir(c_xy = 0.00, c_yx = 0.0))
x, y = columns(first(trajectory(sys, 1000, Ttr = 10000)));
rxy = independence(test, x, y)
ryx = independence(test, y, x)
pvalue(rxy), pvalue(ryx)
(0.0686678284769523, 0.10720581302326959)
At significance level 0.99
, we can't reject the null in either direction, hence there's not enough evidence in the data to suggest directional coupling.
SurrogateAssociationTest
Chatterjee correlation
using Associations
using Random; rng = Xoshiro(1234)
n = 1000
x, y = rand(1:50, n), rand(1:50, n)
test = SurrogateAssociationTest(ChatterjeeCorrelation(), nshuffles = 19)
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: -0.02741571909427165
Ensemble quantiles (19 permutations):
(99.9%): 0.02239057022431747
(99%): 0.02191146528779275
(95%): 0.01978211001434957
p-value: 0.8947368421052632
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, the test indicates that we can't reject independence. What happens if we introduce a third variable that depends on y
?
z = rand(1:20, n) .* y
independence(test, y, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.33056625232204095
Ensemble quantiles (19 permutations):
(99.9%): 0.036769004097300176
(99%): 0.034546659628417783
(95%): 0.024669573100051766
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
The test clearly picks up on the functional dependence.
Azadkia-Chatterjee coefficient
using Associations
using Random; rng = Xoshiro(1234)
n = 1000
# Some categorical variables (we add a small amount of noise to avoid duplicate points
# during neighbor searches)
x = rand(rng, 1.0:50.0, n) .+ rand(n) .* 1e-8
y = rand(rng, 1.0:50.0, n) .+ rand(n) .* 1e-8
test = SurrogateAssociationTest(AzadkiaChatterjeeCoefficient(), nshuffles = 19)
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.04986304986304986
Ensemble quantiles (19 permutations):
(99.9%): 0.10765463965463976
(99%): 0.10117042117042119
(95%): 0.0723516723516724
p-value: 0.10526315789473684
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, the test indicates that we can't reject independence. What happens if we introduce a third variable that depends on y
?
z = rand(rng, 1.0:20.0, n) .* y
independence(test, y, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.5954795954795955
Ensemble quantiles (19 permutations):
(99.9%): 0.07963099963099973
(99%): 0.07451827451827453
(95%): 0.05179505179505184
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
The test clearly picks up on the functional dependence. But what about conditioning? Let's define three variables where x → y → z
. When then expect significant association between x
and y
, possibly between x
and z
(depending on how strong the intermediate connection is), and non-significant association between x
and z
if conditioning on y
(since y
is the variable connecting x
and z
.) The Azadkia-Chatterjee coefficient also should be able to verify these claims.
x = rand(rng, 120)
y = rand(rng, 120) .* x
z = rand(rng, 120) .+ y
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.268282519619418
Ensemble quantiles (19 permutations):
(99.9%): 0.162708521425099
(99%): 0.16135842766858807
(95%): 0.15535801097298424
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
The direct association between x
and y
is detected.
independence(test, x, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.16285853184248905
Ensemble quantiles (19 permutations):
(99.9%): 0.21660143065490667
(99%): 0.21275366344885063
(95%): 0.19565247586637965
p-value: 0.15789473684210525
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
The indirect association between x
and z
is also detected.
independence(test, x, z, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: 0.08940774487471526
Ensemble quantiles (19 permutations):
(99.9%): 0.2442962614147828
(99%): 0.23500413663917583
(95%): 0.19370580430314546
p-value: 0.3157894736842105
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
We can't reject independence between x
and z
when taking into consideration y
, as expected.
Distance correlation
using Associations
x = randn(1000)
y = randn(1000) .+ 0.5x
independence(SurrogateAssociationTest(DistanceCorrelation()), x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.43767759894231617
Ensemble quantiles (100 permutations):
(99.9%): 0.09260364765341375
(99%): 0.0839938160513134
(95%): 0.08139371731133146
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
Partial correlation
using Associations
x = randn(1000)
y = randn(1000) .+ 0.5x
z = randn(1000) .+ 0.8y
independence(SurrogateAssociationTest(PartialCorrelation()), x, z, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.01874019898218705
Ensemble quantiles (100 permutations):
(99.9%): 0.06334030165113608
(99%): 0.058527352632148276
(95%): 0.053531604163270255
p-value: 0.74
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
SMeasure
using Associations
x, y = randn(1000), randn(1000)
measure = SMeasure(dx = 4, dy = 3)
s = association(measure, x, y)
0.04368618656574284
The s
statistic is larger when there is stronger coupling and smaller when there is weaker coupling. To check whether s
is significant (i.e. large enough to claim directional dependence), we can use a SurrogateAssociationTest
.
test = SurrogateAssociationTest(measure)
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.04368618656574284
Ensemble quantiles (100 permutations):
(99.9%): 0.049114162967789525
(99%): 0.04906587322829774
(95%): 0.04807766019908202
p-value: 0.96
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
The p-value is high, and we can't reject the null at any reasonable significance level. Hence, there isn't evidence in the data to support directional coupling from x
to y
.
What happens if we use coupled variables?
z = x .+ 0.1y
independence(test, x, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.29401126036521086
Ensemble quantiles (100 permutations):
(99.9%): 0.04892597719048284
(99%): 0.04878882499512336
(95%): 0.048045807242333205
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
Now we can confidently reject the null (independence), and conclude that there is evidence in the data to support directional dependence from x
to z
.
MIShannon
, categorical
In this example, we expect the preference
and the food
variables to be independent.
using Associations
using Random; rng = Xoshiro(1234)
# Simulate
n = 1000
preference = rand(rng, ["yes", "no"], n)
food = rand(rng, ["veggies", "meat", "fish"], n)
est = JointProbabilities(MIShannon(), CodifyVariables(UniqueElements()))
test = SurrogateAssociationTest(est)
independence(test, preference, food)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.0020932827876065166
Ensemble quantiles (100 permutations):
(99.9%): 0.006577839452690886
(99%): 0.00513633201600364
(95%): 0.003610525794037448
p-value: 0.18
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, there's not enough evidence to reject the null hypothesis that the variables are independent.
CMIShannon
, categorical
Here, we simulate a survey at a ski resort. The data are such that the place a person grew up is associated with how many times they fell while going skiing. The control happens through an intermediate variable preferred_equipment
, which indicates what type of physical activity the person has engaged with in the past. Some activities like skateboarding leads to better overall balance, so people that are good on a skateboard also don't fall, and people that to less challenging activities fall more often.
We should be able to reject places ⫫ experience
, but not reject places ⫫ experience | preferred_equipment
. Let's see if we can detect these relationships using (conditional) mutual information.
using Associations
using Random; rng = Xoshiro(1234)
n = 1000
places = rand(rng, ["city", "countryside", "under a rock"], n);
preferred_equipment = map(places) do place
if cmp(place, "city") == 1
return rand(rng, ["skateboard", "bmx bike"])
elseif cmp(place, "countryside") == 1
return rand(rng, ["sled", "snowcarpet"])
else
return rand(rng, ["private jet", "ferris wheel"])
end
end;
experience = map(preferred_equipment) do equipment
if equipment ∈ ["skateboard", "bmx bike"]
return "didn't fall"
elseif equipment ∈ ["sled", "snowcarpet"]
return "fell 3 times or less"
else
return "fell uncontably many times"
end
end;
est_mi = JointProbabilities(MIShannon(), CodifyVariables(UniqueElements()))
test = SurrogateAssociationTest(est_mi)
independence(test, places, experience)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.8812908992306926
Ensemble quantiles (100 permutations):
(99.9%): 0.006425802229305245
(99%): 0.00529371667413731
(95%): 0.0044286060938151265
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
As expected, the evidence favors the alternative hypothesis that places
and experience
are dependent.
est_cmi = JointProbabilities(CMIShannon(), CodifyVariables(UniqueElements()))
test = SurrogateAssociationTest(est_cmi)
independence(test, places, experience, preferred_equipment)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: 0.0
Ensemble quantiles (100 permutations):
(99.9%): 4.424313209522354e-16
(99%): 4.3700824022392724e-16
(95%): 4.0786026246571525e-16
p-value: 0.97
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
Again, as expected, when conditioning on the mediating variable, the dependence disappears, and we can't reject the null hypothesis of independence.
MCR
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 300)
y = rand(rng, 300)
test = SurrogateAssociationTest(MCR(r = 0.5); rng, nshuffles = 100, surrogate = RandomShuffle())
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.7366804186026148
Ensemble quantiles (100 permutations):
(99.9%): 0.7411308777632686
(99%): 0.7406848632897479
(95%): 0.7398535484664763
p-value: 0.5
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, we can't reject independence. What happens if two variables are coupled?
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 300)
z = x .+ rand(rng, 300)
test = SurrogateAssociationTest(MCR(r = 0.5); rng, nshuffles = 100, surrogate = RandomShuffle())
independence(test, x, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.8598941266033399
Ensemble quantiles (100 permutations):
(99.9%): 0.7440399457983421
(99%): 0.7438934940356006
(95%): 0.7417528709637129
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
Now, because the variables are coupled, the evidence in the data support dependence.
LocalPermutationTest
To demonstrate the local permutation test for independence, we'll again use the chain of unidirectionally coupled logistic maps.
We'll implement a set of chained logistic maps with unidirectional coupling.
using DynamicalSystemsBase
Base.@kwdef struct Logistic4Chain{V, RX, RY, RZ, RW, C1, C2, C3, Σ1, Σ2, Σ3, RNG}
xi::V = [0.1, 0.2, 0.3, 0.4]
rx::RX = 3.9
ry::RY = 3.6
rz::RZ = 3.6
rw::RW = 3.8
c_xy::C1 = 0.4
c_yz::C2 = 0.4
c_zw::C3 = 0.35
σ_xy::Σ1 = 0.05
σ_yz::Σ2 = 0.05
σ_zw::Σ3 = 0.05
rng::RNG = Random.default_rng()
end
function eom_logistic4_chain(u, p::Logistic4Chain, t)
(; xi, rx, ry, rz, rw, c_xy, c_yz, c_zw, σ_xy, σ_yz, σ_zw, rng) = p
x, y, z, w = u
f_xy = (y + c_xy*(x + σ_xy * rand(rng)) ) / (1 + c_xy*(1+σ_xy))
f_yz = (z + c_yz*(y + σ_yz * rand(rng)) ) / (1 + c_yz*(1+σ_yz))
f_zw = (w + c_zw*(z + σ_zw * rand(rng)) ) / (1 + c_zw*(1+σ_zw))
dx = rx * x * (1 - x)
dy = ry * (f_xy) * (1 - f_xy)
dz = rz * (f_yz) * (1 - f_yz)
dw = rw * (f_zw) * (1 - f_zw)
return SVector{4}(dx, dy, dz, dw)
end
function system(definition::Logistic4Chain)
return DiscreteDynamicalSystem(eom_logistic4_chain, definition.xi, definition)
end
system (generic function with 1 method)
CMIShannon
To estimate CMI, we'll use the Kraskov
differential entropy estimator, which naively computes CMI as a sum of entropy terms without guaranteed bias cancellation.
using Associations
using Random; rng = Xoshiro(1234)
n = 100
X = randn(rng, n)
Y = X .+ randn(rng, n) .* 0.4
Z = randn(rng, n) .+ Y
x, y, z = StateSpaceSet.((X, Y, Z))
test = LocalPermutationTest(FPVP(CMIShannon()), nshuffles = 19)
independence(test, x, y, z)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.38677941849665654
Ensemble quantiles (19 permutations):
(99.9%): -0.7715622374189021
(99%): -0.7731051461840173
(95%): -0.7799625184734186
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
We expect there to be a detectable influence from $X$ to $Y$, if we condition on $Z$ or not, because $Z$ doesn't influence neither $X$ nor $Y$. The null hypothesis is that the first two variables are conditionally independent given the third, which we reject with a very low p-value. Hence, we accept the alternative hypothesis that the first two variables $X$ and $Y$. are conditionally dependent given $Z$.
independence(test, x, z, y)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.871101967192972
Ensemble quantiles (19 permutations):
(99.9%): -0.7446264602541617
(99%): -0.7472548301562656
(95%): -0.7589364741656163
p-value: 0.42105263157894735
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, we cannot reject the null hypothesis that $X$ and $Z$ are conditionally independent given $Y$, because $Y$ is the variable that transmits information from $X$ to $Z$.
TEShannon
Here, we demonstrate LocalPermutationTest
with the TEShannon
measure with default parameters and the FPVP
estimator. We'll use the system of four coupled logistic maps that are linked X → Y → Z → W
defined above.
We should expect the transfer entropy X → Z
to be non-significant when conditioning on Y
, because all information from X
to Z
is transferred through Y
.
using Associations
using Random; rng = Random.default_rng()
n = 300
sys = system(Logistic4Chain(; xi = rand(rng, 4), rng))
x, y, z, w = columns(trajectory(sys, n) |> first)
est = CMIDecomposition(TEShannon(), FPVP(k = 10))
test = LocalPermutationTest(est, nshuffles = 19)
independence(test, x, z, y)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.12756338859844138
Ensemble quantiles (19 permutations):
(99.9%): -0.11664061612613967
(99%): -0.11683280443699266
(95%): -0.11768697470745038
p-value: 0.21052631578947367
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
As expected, we cannot reject the null hypothesis that X
and Z
are conditionally independent given Y
.
The same goes for variables one step up the chain:
independence(test, y, w, z)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.11815791406237483
Ensemble quantiles (19 permutations):
(99.9%): -0.11619634672457665
(99%): -0.11644280359634489
(95%): -0.11753816747087037
p-value: 0.10526315789473684
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
AzadkiaChatterjeeCoefficient
using Associations
using Random; rng = Xoshiro(1234)
n = 300
# Some categorical variables (we add a small amount of noise to avoid duplicate points
# during neighbor searches)
test = LocalPermutationTest(AzadkiaChatterjeeCoefficient(), nshuffles = 19)
x = rand(rng, n)
y = rand(rng, n) .* x
z = rand(rng, n) .+ y
300-element Vector{Float64}:
0.8400073625903303
0.6760999796587334
1.2970222394223172
0.28046494723059
0.07112406664985896
1.1729404638906242
0.856424638718228
1.310344128498991
0.8823353911198246
1.177764080080661
⋮
0.9457388135862522
1.0644831431175747
1.243107951606951
0.6538499522501744
0.9949521527315198
0.21574568394173727
1.2125757185784312
0.6267282902318254
0.5266964509997589
Let's define three variables where x → y → z
. We expect a non-significant association between x
and z
if conditioning on y
(since y
is the variable connecting x
and z
.)
independence(test, x, z, y)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.10557508789552988
Ensemble quantiles (19 permutations):
(99.9%): 0.11010601234030959
(99%): 0.10958642899363939
(95%): 0.10727716967510513
p-value: 0.631578947368421
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
The test verifies our expectation.
SECMITest
JointProbabilities
estimation on numeric data
using Associations
using Test
using Random; rng = Xoshiro(1234)
n = 25
x = rand(rng, n)
y = randn(rng, n) .+ x .^ 2
z = randn(rng, n) .* y
# An estimator for estimating the SECMI measure
est = JointProbabilities(SECMI(base = 2), CodifyVariables(ValueBinning(3)))
test = SECMITest(est; nshuffles = 19)
SECMITest{JointProbabilities{ShortExpansionConditionalMutualInformation{Int64}, CodifyVariables{1, ValueBinning{RectangularBinning{Int64}}}, RelativeAmount}, RandomShuffle, Int64, Random.TaskLocalRNG}(JointProbabilities{ShortExpansionConditionalMutualInformation{Int64}, CodifyVariables{1, ValueBinning{RectangularBinning{Int64}}}, RelativeAmount}(ShortExpansionConditionalMutualInformation(; base = 2), CodifyVariables{1, ValueBinning{RectangularBinning{Int64}}}((ValueBinning(binning = RectangularBinning{Int64}(3, false)),)), RelativeAmount()), RandomShuffle(), 19, Random.TaskLocalRNG())
When analyzing $SECMI(x, y | z)$, the expectation is to reject the null hypothesis (independence), since x
and y
are connected, regardless of the effect of z
.
independence(test, x, y, z)
`SECMITEST` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated parameters:
μ̂ = 0.25805644718504456, σ̂ = 0.0073792890873878095
D𝒩 = 0.47035974938829067, D𝒳² = 0.680538137544112
p-value: 0.0
α = 0.05: ✓ Evidence favors dependence
α = 0.01: ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence
We can detect this association, even for n = 25
! When analyzing $SECMI(x, z | y)$, we expect that we can't reject the null (indepdendence), precisely since x
and z
are not connected when "conditioning away" y
.
independence(test, x, z, y)
`SECMITEST` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated parameters:
μ̂ = 0.31004273999202697, σ̂ = 0.013770904101472135
D𝒩 = 0.469898049687456, D𝒳² = 0.646392964764863
p-value: 0.9999999984107697
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected
JointProbabilities
estimation on categorical data
Note that this also works for categorical variables. Just use UniqueElements
to discretize!
using Associations
using Test
using Random; rng = Xoshiro(1234)
n = 24
x = rand(rng, ["vegetables", "candy"], n)
y = [xᵢ == "candy" && rand(rng) > 0.3 ? "yummy" : "yuck" for xᵢ in x]
z = [yᵢ == "yummy" && rand(rng) > 0.6 ? "grown-up" : "child" for yᵢ in y]
d = CodifyVariables(UniqueElements())
est = JointProbabilities(SECMI(base = 2), d)
independence(SECMITest(est; nshuffles = 19), x, z, y)
`SECMITEST` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated parameters:
μ̂ = 0.0310780574994074, σ̂ = 0.0009485277824744301
D𝒩 = 0.5789473684210527, D𝒳² = 0.7508967119517067
p-value: 1.0
α = 0.05: ✖ Independence cannot be rejected
α = 0.01: ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected