Examples of independence testing

CorrTest

using Associations
using Random; rng = Xoshiro(1234)

# Some normally distributed data
X = randn(rng, 1000)
Y = 0.5*randn(rng, 1000) .+ X
Z = 0.5*randn(rng, 1000) .+ Y
W = randn(rng, 1000);
1000-element Vector{Float64}:
  0.6740170412967725
  0.7629987942399918
 -2.1339097080425504
  0.1591207240795622
  1.0412663555321997
 -0.9251395576320531
  1.3861677761743247
  0.1621593830146287
  1.5175910382337752
 -0.22779737050048576
  ⋮
 -1.447126550264058
  0.7643777251484426
 -1.1217653063785107
  1.4125903864299203
  0.5244479758243477
  0.5962220799023128
  0.049970164703266234
 -0.7466327764477875
  0.8135396561180497

Let's test a few independence relationships. For example, we expect that X ⫫ W. We also expect dependence X !⫫ Z, but this dependence should vanish when conditioning on the intermediate variable, so we expect X ⫫ Z | Y.

independence(CorrTest(), X, W)
`CorrTest` independence test
----------------------------------------------------------------------------------
H₀: "The first two variables are independent (given the 3rd variable, if relevant)"
H₁: "The first two variables are dependent (given the 3rd variable, if relevant)"
----------------------------------------------------------------------------------
ρ, (partial) correlation: 0.025737972061567187
p-value (two-sided):      0.41629610085016866
  α = 0.05:  ✖ Independence cannot be rejected
  α = 0.01:  ✖ Independence cannot be rejected
  α = 0.001: ✖ Independence cannot be rejected

As expected, the outcome is that we can't reject the null hypothesis that X ⫫ W.

independence(CorrTest(), X, Z)
`CorrTest` independence test
----------------------------------------------------------------------------------
H₀: "The first two variables are independent (given the 3rd variable, if relevant)"
H₁: "The first two variables are dependent (given the 3rd variable, if relevant)"
----------------------------------------------------------------------------------
ρ, (partial) correlation: 0.8334821068202709
p-value (two-sided):      0.0
  α = 0.05:  ✓ Evidence favors dependence
  α = 0.01:  ✓ Evidence favors dependence
  α = 0.001: ✓ Evidence favors dependence

However, we can reject the null hypothesis that X ⫫ Z, so the evidence favors the alternative hypothesis X !⫫ Z.

independence(CorrTest(), X, Z, Y)
`CorrTest` independence test
----------------------------------------------------------------------------------
H₀: "The first two variables are independent (given the 3rd variable, if relevant)"
H₁: "The first two variables are dependent (given the 3rd variable, if relevant)"
----------------------------------------------------------------------------------
ρ, (partial) correlation: -0.016016159237136465
p-value (two-sided):      0.6132044078198666
  α = 0.05:  ✖ Independence cannot be rejected
  α = 0.01:  ✖ Independence cannot be rejected
  α = 0.001: ✖ Independence cannot be rejected

As expected, the correlation between X and Z significantly vanishes when conditioning on Y, because Y is solely responsible for the observed correlation between X and Y.

JointDistanceDistributionTest

Bidirectionally coupled logistic maps

Let's use the built-in logistic2_bidir discrete dynamical system to create a pair of bidirectionally coupled time series and use the JointDistanceDistributionTest to see if we can confirm from observed time series that these variables are bidirectionally coupled. We'll use a significance level of 1 - α = 0.99, i.e. α = 0.01.

using Associations
using DynamicalSystemsBase
using Random; rng = Xoshiro(1234)
Base.@kwdef struct Logistic2Bidir{V, C1, C2, R1, R2, Σx, Σy, R}
    xi::V = [0.5, 0.5]
    c_xy::C1 = 0.1
    c_yx::C2 = 0.1
    r₁::R1 = 3.78
    r₂::R2 = 3.66
    σ_xy::Σx = 0.05
    σ_yx::Σy = 0.05
    rng::R = Random.default_rng()
end

function system(definition::Logistic2Bidir)
    return DiscreteDynamicalSystem(eom_logistic2bidir, definition.xi, definition)
end

function eom_logistic2bidir(u, p::Logistic2Bidir, t)
    (; xi, c_xy, c_yx, r₁, r₂, σ_xy, σ_yx, rng) = p
    x, y = u
    f_xy = (y +  c_xy*(x + σ_xy * rand(rng)) ) / (1 + c_xy*(1+σ_xy))
    f_yx = (x +  c_yx*(y + σ_yx * rand(rng)) ) / (1 + c_yx*(1+σ_yx))
    dx = r₁ * (f_yx) * (1 - f_yx)
    dy = r₂ * (f_xy) * (1 - f_xy)
    return SVector{2}(dx, dy)
end
eom_logistic2bidir (generic function with 1 method)

We start by generating some time series and configuring the test.

using Associations
sys = system(Logistic2Bidir(c_xy = 0.5, c_yx = 0.4))
x, y = columns(first(trajectory(sys, 2000, Ttr = 10000)))
measure = JointDistanceDistribution(D = 5, B = 5)
test = JointDistanceDistributionTest(measure)
JointDistanceDistributionTest{JointDistanceDistribution{Euclidean, Float64}, Random.TaskLocalRNG}(JointDistanceDistribution{Euclidean, Float64}(Euclidean(0.0), 5, 5, -1, 0.0), Random.TaskLocalRNG())

Now, we test for independence in both directions.

independence(test, x, y)
`JointDistanceDistributionTest` independence test
----------------------------------------------------------------------------------
H₀: μ(Δ) = 0 (the input variables are independent)
H₁: μ(Δ) > 0 (there is directional dependence between the input variables)
----------------------------------------------------------------------------------
Hypothetical μ(Δ): 0.0
Observed μ(Δ):     0.46593119524053234
p-value:           1.3854593215678612e-5
  α = 0.05:  ✓ Evidence favors dependence
  α = 0.01:  ✓ Evidence favors dependence
  α = 0.001: ✓ Evidence favors dependence
independence(test, y, x)
`JointDistanceDistributionTest` independence test
----------------------------------------------------------------------------------
H₀: μ(Δ) = 0 (the input variables are independent)
H₁: μ(Δ) > 0 (there is directional dependence between the input variables)
----------------------------------------------------------------------------------
Hypothetical μ(Δ): 0.0
Observed μ(Δ):     0.4731496343162716
p-value:           1.8519256692606234e-5
  α = 0.05:  ✓ Evidence favors dependence
  α = 0.01:  ✓ Evidence favors dependence
  α = 0.001: ✓ Evidence favors dependence

As expected, the null hypothesis is rejected in both directions at the pre-determined significance level, and hence we detect directional coupling in both directions.

Non-coupled logistic maps

What happens in the example above if there is no coupling?

sys = system(Logistic2Bidir(c_xy = 0.00, c_yx = 0.0))
x, y = columns(first(trajectory(sys, 1000, Ttr = 10000)));
rxy = independence(test, x, y)
ryx = independence(test, y, x)
pvalue(rxy), pvalue(ryx)
(0.0686678284769523, 0.10720581302326959)

At significance level 0.99, we can't reject the null in either direction, hence there's not enough evidence in the data to suggest directional coupling.

SurrogateAssociationTest

Chatterjee correlation

using Associations
using Random; rng = Xoshiro(1234)
n = 1000
x, y = rand(1:50, n), rand(1:50, n)
test = SurrogateAssociationTest(ChatterjeeCorrelation(), nshuffles = 19)
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.021078956684557504
Ensemble quantiles (19 permutations):
    (99.9%): 0.026352104269524005
    (99%):   0.025468837743600887
    (95%):   0.021543208739498193
p-value:   0.05263157894736842
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

As expected, the test indicates that we can't reject independence. What happens if we introduce a third variable that depends on y?

z = rand(1:20, n) .* y
independence(test, y, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.32071924627680715
Ensemble quantiles (19 permutations):
    (99.9%): 0.03836686130571781
    (99%):   0.03833620215396191
    (95%):   0.038199939257269
p-value:   0.0
α = 0.05:  ✓ Evidence favors dependence
α = 0.01:  ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence

The test clearly picks up on the functional dependence.

Azadkia-Chatterjee coefficient

using Associations
using Random; rng = Xoshiro(1234)
n = 1000
# Some categorical variables (we add a small amount of noise to avoid duplicate points
# during neighbor searches)
x = rand(rng, 1.0:50.0, n) .+ rand(n) .* 1e-8
y = rand(rng, 1.0:50.0, n) .+ rand(n) .* 1e-8
test = SurrogateAssociationTest(AzadkiaChatterjeeCoefficient(), nshuffles = 19)
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: -0.01088101088101088
Ensemble quantiles (19 permutations):
    (99.9%): 0.059265071265071294
    (99%):   0.057213177213177216
    (95%):   0.048093648093648106
p-value:   0.8421052631578947
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

As expected, the test indicates that we can't reject independence. What happens if we introduce a third variable that depends on y?

z = rand(rng, 1.0:20.0, n) .* y
independence(test, y, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.5954735954735955
Ensemble quantiles (19 permutations):
    (99.9%): 0.06263511863511863
    (99%):   0.0626176226176226
    (95%):   0.06253986253986253
p-value:   0.0
α = 0.05:  ✓ Evidence favors dependence
α = 0.01:  ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence

The test clearly picks up on the functional dependence. But what about conditioning? Let's define three variables where x → y → z. When then expect significant association between x and y, possibly between x and z (depending on how strong the intermediate connection is), and non-significant association between x and z if conditioning on y (since y is the variable connecting x and z.) The Azadkia-Chatterjee coefficient also should be able to verify these claims.

x = rand(rng, 120)
y = rand(rng, 120) .* x
z = rand(rng, 120) .+ y
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.268282519619418
Ensemble quantiles (19 permutations):
    (99.9%): 0.11942134870477125
    (99%):   0.11476352524480868
    (95%):   0.09406208764497538
p-value:   0.0
α = 0.05:  ✓ Evidence favors dependence
α = 0.01:  ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence

The direct association between x and y is detected.

independence(test, x, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.16285853184248905
Ensemble quantiles (19 permutations):
    (99.9%): 0.14869504826724086
    (99%):   0.14498229043683591
    (95%):   0.12848114452392528
p-value:   0.0
α = 0.05:  ✓ Evidence favors dependence
α = 0.01:  ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence

The indirect association between x and z is also detected.

independence(test, x, z, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: 0.08940774487471526
Ensemble quantiles (19 permutations):
    (99.9%): 0.1642554107122432
    (99%):   0.16422711092471343
    (95%):   0.1641013340912478
p-value:   0.15789473684210525
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

We can't reject independence between x and z when taking into consideration y, as expected.

Distance correlation

using Associations
x = randn(1000)
y = randn(1000) .+ 0.5x
independence(SurrogateAssociationTest(DistanceCorrelation()), x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.4000819865961263
Ensemble quantiles (100 permutations):
    (99.9%): 0.09829456534594846
    (99%):   0.09405209386603855
    (95%):   0.07946982002531558
p-value:   0.0
α = 0.05:  ✓ Evidence favors dependence
α = 0.01:  ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence

Partial correlation

using Associations
x = randn(1000)
y = randn(1000) .+ 0.5x
z = randn(1000) .+ 0.8y
independence(SurrogateAssociationTest(PartialCorrelation()), x, z, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.005232884464733854
Ensemble quantiles (100 permutations):
    (99.9%): 0.10780800187947868
    (99%):   0.071347093061136
    (95%):   0.05128506749699176
p-value:   0.58
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

SMeasure

using Associations
x, y = randn(1000), randn(1000)
measure = SMeasure(dx = 4, dy = 3)
s = association(measure,     x, y)
0.04349104289364991

The s statistic is larger when there is stronger coupling and smaller when there is weaker coupling. To check whether s is significant (i.e. large enough to claim directional dependence), we can use a SurrogateAssociationTest.

test = SurrogateAssociationTest(measure)
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.04349104289364991
Ensemble quantiles (100 permutations):
    (99.9%): 0.04851040501856842
    (99%):   0.048277197295723454
    (95%):   0.047556445298110546
p-value:   0.94
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

The p-value is high, and we can't reject the null at any reasonable significance level. Hence, there isn't evidence in the data to support directional coupling from x to y.

What happens if we use coupled variables?

z = x .+ 0.1y
independence(test, x, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.2810827901317333
Ensemble quantiles (100 permutations):
    (99.9%): 0.047992531456434055
    (99%):   0.0478242772711801
    (95%):   0.04752460685307059
p-value:   0.0
α = 0.05:  ✓ Evidence favors dependence
α = 0.01:  ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence

Now we can confidently reject the null (independence), and conclude that there is evidence in the data to support directional dependence from x to z.

MIShannon, categorical

In this example, we expect the preference and the food variables to be independent.

using Associations
using Random; rng = Xoshiro(1234)
# Simulate
n = 1000
preference = rand(rng, ["yes", "no"], n)
food = rand(rng, ["veggies", "meat", "fish"], n)
est = JointProbabilities(MIShannon(), CodifyVariables(UniqueElements()))
test = SurrogateAssociationTest(est)
independence(test, preference, food)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.0020932827876065166
Ensemble quantiles (100 permutations):
    (99.9%): 0.007872700372189418
    (99%):   0.00775389007729483
    (95%):   0.005703275167612416
p-value:   0.29
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

As expected, there's not enough evidence to reject the null hypothesis that the variables are independent.

CMIShannon, categorical

Here, we simulate a survey at a ski resort. The data are such that the place a person grew up is associated with how many times they fell while going skiing. The control happens through an intermediate variable preferred_equipment, which indicates what type of physical activity the person has engaged with in the past. Some activities like skateboarding leads to better overall balance, so people that are good on a skateboard also don't fall, and people that to less challenging activities fall more often.

We should be able to reject places ⫫ experience, but not reject places ⫫ experience | preferred_equipment. Let's see if we can detect these relationships using (conditional) mutual information.

using Associations
using Random; rng = Xoshiro(1234)
n = 1000

places = rand(rng, ["city", "countryside", "under a rock"], n);
preferred_equipment = map(places) do place
    if cmp(place, "city") == 1
        return rand(rng, ["skateboard", "bmx bike"])
    elseif cmp(place, "countryside") == 1
        return rand(rng, ["sled", "snowcarpet"])
    else
        return rand(rng, ["private jet", "ferris wheel"])
    end
end;
experience = map(preferred_equipment) do equipment
    if equipment ∈ ["skateboard", "bmx bike"]
        return "didn't fall"
    elseif equipment ∈ ["sled", "snowcarpet"]
        return "fell 3 times or less"
    else
        return "fell uncontably many times"
    end
end;
est_mi = JointProbabilities(MIShannon(), CodifyVariables(UniqueElements()))
test = SurrogateAssociationTest(est_mi)
independence(test, places, experience)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.8812908992306926
Ensemble quantiles (100 permutations):
    (99.9%): 0.006187801816714845
    (99%):   0.005805970598888199
    (95%):   0.003970392742710654
p-value:   0.0
α = 0.05:  ✓ Evidence favors dependence
α = 0.01:  ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence

As expected, the evidence favors the alternative hypothesis that places and experience are dependent.

est_cmi = JointProbabilities(CMIShannon(), CodifyVariables(UniqueElements()))
test = SurrogateAssociationTest(est_cmi)
independence(test, places, experience, preferred_equipment)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: 0.0
Ensemble quantiles (100 permutations):
    (99.9%): 4.3871086141070485e-16
    (99%):   4.372837349032553e-16
    (95%):   3.936851001863341e-16
p-value:   1.0
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

Again, as expected, when conditioning on the mediating variable, the dependence disappears, and we can't reject the null hypothesis of independence.

MCR

using Associations
using Random; rng = Xoshiro(1234)

x = rand(rng, 300)
y = rand(rng, 300)
test = SurrogateAssociationTest(MCR(r = 0.5); rng, nshuffles = 100, surrogate = RandomShuffle())
independence(test, x, y)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.7366804186026148
Ensemble quantiles (100 permutations):
    (99.9%): 0.7411308777632686
    (99%):   0.7406848632897479
    (95%):   0.7398535484664763
p-value:   0.5
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

As expected, we can't reject independence. What happens if two variables are coupled?

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 300)
z = x .+ rand(rng, 300)
test = SurrogateAssociationTest(MCR(r = 0.5); rng, nshuffles = 100, surrogate = RandomShuffle())
independence(test, x, z)
`SurrogateAssociationTest` independence test
---------------------------------------------------------------------
H₀: "The variables are independent"
Hₐ: "The variables are dependent"
---------------------------------------------------------------------
Estimated: 0.8598941266033399
Ensemble quantiles (100 permutations):
    (99.9%): 0.7440399457983421
    (99%):   0.7438934940356006
    (95%):   0.7417528709637129
p-value:   0.0
α = 0.05:  ✓ Evidence favors dependence
α = 0.01:  ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence

Now, because the variables are coupled, the evidence in the data support dependence.

LocalPermutationTest

To demonstrate the local permutation test for independence, we'll again use the chain of unidirectionally coupled logistic maps.

We'll implement a set of chained logistic maps with unidirectional coupling.

using DynamicalSystemsBase
Base.@kwdef struct Logistic4Chain{V, RX, RY, RZ, RW, C1, C2, C3, Σ1, Σ2, Σ3, RNG}
    xi::V = [0.1, 0.2, 0.3, 0.4]
    rx::RX = 3.9
    ry::RY = 3.6
    rz::RZ = 3.6
    rw::RW = 3.8
    c_xy::C1 = 0.4
    c_yz::C2 = 0.4
    c_zw::C3 = 0.35
    σ_xy::Σ1 = 0.05
    σ_yz::Σ2 = 0.05
    σ_zw::Σ3 = 0.05
    rng::RNG = Random.default_rng()
end

function eom_logistic4_chain(u, p::Logistic4Chain, t)
    (; xi, rx, ry, rz, rw, c_xy, c_yz, c_zw, σ_xy, σ_yz, σ_zw, rng) = p
    x, y, z, w = u
    f_xy = (y +  c_xy*(x + σ_xy * rand(rng)) ) / (1 + c_xy*(1+σ_xy))
    f_yz = (z +  c_yz*(y + σ_yz * rand(rng)) ) / (1 + c_yz*(1+σ_yz))
    f_zw = (w +  c_zw*(z + σ_zw * rand(rng)) ) / (1 + c_zw*(1+σ_zw))
    dx = rx * x * (1 - x)
    dy = ry * (f_xy) * (1 - f_xy)
    dz = rz * (f_yz) * (1 - f_yz)
    dw = rw * (f_zw) * (1 - f_zw)
    return SVector{4}(dx, dy, dz, dw)
end


function system(definition::Logistic4Chain)
    return DiscreteDynamicalSystem(eom_logistic4_chain, definition.xi, definition)
end
system (generic function with 1 method)

CMIShannon

To estimate CMI, we'll use the Kraskov differential entropy estimator, which naively computes CMI as a sum of entropy terms without guaranteed bias cancellation.

using Associations
using Random; rng = Xoshiro(1234)
n = 100
X = randn(rng, n)
Y = X .+ randn(rng, n) .* 0.4
Z = randn(rng, n) .+ Y
x, y, z = StateSpaceSet.((X, Y, Z))
test = LocalPermutationTest(FPVP(CMIShannon()), nshuffles = 19)
independence(test, x, y, z)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.38677941849665654
Ensemble quantiles (19 permutations):
    (99.9%): -0.731915090024784
    (99%):   -0.7372511896014173
    (95%):   -0.7609671877197877
p-value:   0.0
α = 0.05:  ✓ Evidence favors dependence
α = 0.01:  ✓ Evidence favors dependence
α = 0.001: ✓ Evidence favors dependence

We expect there to be a detectable influence from $X$ to $Y$, if we condition on $Z$ or not, because $Z$ doesn't influence neither $X$ nor $Y$. The null hypothesis is that the first two variables are conditionally independent given the third, which we reject with a very low p-value. Hence, we accept the alternative hypothesis that the first two variables $X$ and $Y$. are conditionally dependent given $Z$.

independence(test, x, z, y)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.871101967192972
Ensemble quantiles (19 permutations):
    (99.9%): -0.7733277601425634
    (99%):   -0.7780999511403291
    (95%):   -0.7993096889081768
p-value:   0.2631578947368421
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

As expected, we cannot reject the null hypothesis that $X$ and $Z$ are conditionally independent given $Y$, because $Y$ is the variable that transmits information from $X$ to $Z$.

TEShannon

Here, we demonstrate LocalPermutationTest with the TEShannon measure with default parameters and the FPVP estimator. We'll use the system of four coupled logistic maps that are linked X → Y → Z → W defined above.

We should expect the transfer entropy X → Z to be non-significant when conditioning on Y, because all information from X to Z is transferred through Y.

using Associations
using Random; rng = Random.default_rng()
n = 300
sys = system(Logistic4Chain(; xi = rand(rng, 4), rng))
x, y, z, w = columns(trajectory(sys, n) |> first)
est = CMIDecomposition(TEShannon(), FPVP(k = 10))
test = LocalPermutationTest(est, nshuffles = 19)
independence(test, x, z, y)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.1436771628302946
Ensemble quantiles (19 permutations):
    (99.9%): -0.11915758578683851
    (99%):   -0.11931348602530281
    (95%):   -0.12000637597403307
p-value:   0.7368421052631579
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

As expected, we cannot reject the null hypothesis that X and Z are conditionally independent given Y.

The same goes for variables one step up the chain:

independence(test, y, w, z)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.12265687091712067
Ensemble quantiles (19 permutations):
    (99.9%): -0.11172105268956835
    (99%):   -0.11263344290314094
    (95%):   -0.11668851051901905
p-value:   0.10526315789473684
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

AzadkiaChatterjeeCoefficient

using Associations
using Random; rng = Xoshiro(1234)
n = 300
# Some categorical variables (we add a small amount of noise to avoid duplicate points
# during neighbor searches)
test = LocalPermutationTest(AzadkiaChatterjeeCoefficient(), nshuffles = 19)
x = rand(rng, n)
y = rand(rng, n) .* x
z = rand(rng, n) .+ y
300-element Vector{Float64}:
 0.8400073625903303
 0.6760999796587334
 1.2970222394223172
 0.28046494723059
 0.07112406664985896
 1.1729404638906242
 0.856424638718228
 1.310344128498991
 0.8823353911198246
 1.177764080080661
 ⋮
 0.9457388135862522
 1.0644831431175747
 1.243107951606951
 0.6538499522501744
 0.9949521527315198
 0.21574568394173727
 1.2125757185784312
 0.6267282902318254
 0.5266964509997589

Let's define three variables where x → y → z. We expect a non-significant association between x and z if conditioning on y (since y is the variable connecting x and z.)

independence(test, x, z, y)
`LocalPermutationTest` independence test
---------------------------------------------------------------------
H₀: "The first two variables are independent, given the 3rd variable"
Hₐ: "The first two variables are dependent, given the 3rd variable"
---------------------------------------------------------------------
Estimated: -0.10557508789552988
Ensemble quantiles (19 permutations):
    (99.9%): 0.14996961090103622
    (99%):   0.13463334750408343
    (95%):   0.06647217685096114
p-value:   0.6842105263157895
α = 0.05:  ✖ Independence cannot be rejected
α = 0.01:  ✖ Independence cannot be rejected
α = 0.001: ✖ Independence cannot be rejected

The test verifies our expectation.