Examples of association measure estimation
HellingerDistance
From precomputed probabilities
using Associations
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(HellingerDistance(), p1, p2)
0.055856605640373105
JointProbabilities
+ OrdinalPatterns
We expect the Hellinger distance between two uncorrelated variables to be close to zero.
using Associations
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(HellingerDistance(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
3.1541797712493756e-6
KLDivergence
From precomputed probabilities
using Associations
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(KLDivergence(), p1, p2)
0.20998654701098748
JointProbabilities
+ OrdinalPatterns
We expect the KLDivergence
between two uncorrelated variables to be close to zero.
using Associations
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(KLDivergence(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
1.2875933381844337e-5
RenyiDivergence
From precomputed probabilities
using Associations
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(RenyiDivergence(), p1, p2)
0.11627470204599202
JointProbabilities
+ OrdinalPatterns
We expect the RenyiDivergence
between two uncorrelated variables to be close to zero.
using Associations
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(RenyiDivergence(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
6.4354135893600935e-6
VariationDistance
From precomputed probabilities
using Associations
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(VariationDistance(), p1, p2)
0.2
JointProbabilities
+ OrdinalPatterns
We expect the VariationDistance
between two uncorrelated variables to be close to zero.
using Associations
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(VariationDistance(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero
0.001530030600612034
JointEntropyShannon
JointProbabilities
with Dispersion
using Associations
using Random; rng = Xoshiro(1234)
x, y = rand(rng, 100), rand(rng, 100)
measure = JointEntropyShannon()
discretization = CodifyVariables(Dispersion(m = 2, c = 3))
est = JointProbabilities(measure, discretization)
association(est, x, y)
3.115303256966459
JointEntropyTsallis
JointProbabilities
with OrdinalPatterns
using Associations
using Random; rng = Xoshiro(1234)
x, y = rand(rng, 100), rand(rng, 100)
measure = JointEntropyTsallis()
discretization = CodifyVariables(OrdinalPatterns(m = 3))
est = JointProbabilities(measure, discretization)
association(est, x, y)
2.3340683377759546
JointEntropyRenyi
JointProbabilities
with OrdinalPatterns
using Associations
using Random; rng = Xoshiro(1234)
x, y = rand(rng, 100), rand(rng, 100)
measure = JointEntropyRenyi(q = 0.5)
discretization = CodifyVariables(ValueBinning(2))
est = JointProbabilities(measure, discretization)
association(est, x, y)
1.9472483134811365
ConditionalEntropyShannon
Analytical examples
This is essentially example 2.2.1 in Cover & Thomas (2006), where they use the following relative frequency table as an example. Notethat Julia is column-major, so we need to transpose their example. Then their X
is in the first dimension of our table (along columns) and their Y
is our second dimension (rows).
using Associations
freqs_yx = [1//8 1//16 1//32 1//32;
1//16 1//8 1//32 1//32;
1//16 1//16 1//16 1//16;
1//4 0//1 0//1 0//1];
# `freqs_yx` is already normalized, se we can feed it directly to `Probabilities`
pxy = Probabilities(freqs_yx)
4×4 Probabilities{Rational{Int64},2}
Outcome(1) Outcome(2) Outcome(3) Outcome(4)
Outcome(1) 1//8 1//16 1//32 1//32
Outcome(2) 1//16 1//8 1//32 1//32
Outcome(3) 1//16 1//16 1//16 1//16
Outcome(4) 1//4 0//1 0//1 0//1
The marginal distribution for x
(first dimension) is
marginal(pxy, dims = 2)
Probabilities{Rational{Int64},1} over 4 outcomes
Outcome(1) 1//2
Outcome(2) 1//4
Outcome(3) 1//8
Outcome(4) 1//8
The marginal distribution for y
(second dimension) is
marginal(pxy, dims = 1)
Probabilities{Rational{Int64},1} over 4 outcomes
Outcome(1) 1//4
Outcome(2) 1//4
Outcome(3) 1//4
Outcome(4) 1//4
And the Shannon conditional entropy $H^S(X | Y)$
ce_x_given_y = association(ConditionalEntropyShannon(), pxy) |> Rational
13//8
This is the same as in their example. Hooray! To compute $H^S(Y | X)$, we just need to flip the contingency matrix.
pyx = Probabilities(transpose(freqs_yx))
ce_y_given_x = association(ConditionalEntropyShannon(), pyx) |> Rational
11//8
JointProbabilities
+ CodifyVariables
+ UniqueElements
We can of course also estimate conditional entropy from data. To do so, we'll use the JointProbabilities
estimator, which constructs a multivariate PMF for us. Thus, we don't explicitly need a set of counts, like in the example above, because they are estimated under the hood for us.
Let's first demonstrate on some categorical data. For that, we must use UniqueElements
as the discretization (i.e. just count unique elements).
using Associations
using Random; rng = Xoshiro(1234)
n = 1000
rating = rand(rng, 1:6, n)
movie = rand(rng, ["The Witcher: the movie", "Lord of the Rings"], n)
disc = CodifyVariables(UniqueElements())
est = JointProbabilities(ConditionalEntropyShannon(), disc)
association(est, rating, movie)
2.5744931842297527
JointProbabilities
+ CodifyPoints
+ UniqueElementsEncoding
using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1:5, 100), rand(rng, 1:5, 100), rand(rng, 1:3, 100)
X = StateSpaceSet(x, z)
Y = StateSpaceSet(y, z)
disc = CodifyPoints(UniqueElementsEncoding(X), UniqueElementsEncoding(Y));
est = JointProbabilities(ConditionalEntropyShannon(), disc);
association(est, X, Y)
1.8234663200044612
ConditionalEntropyTsallisAbe
JointProbabilities
+ CodifyVariables
+ UniqueElements
We'll here repeat the analysis we did for ConditionalEntropyShannon
above.
using Associations
using Random; rng = Xoshiro(1234)
n = 1000
rating = rand(rng, 1:6, n)
movie = rand(rng, ["The Witcher: the movie", "Lord of the Rings"], n)
disc = CodifyVariables(UniqueElements())
est = JointProbabilities(ConditionalEntropyTsallisAbe(q =1.5), disc)
association(est, rating, movie)
1.7608975191987752
JointProbabilities
+ CodifyPoints
+ UniqueElementsEncoding
using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1:5, 100), rand(rng, 1:5, 100), rand(rng, 1:3, 100)
X = StateSpaceSet(x, z)
Y = StateSpaceSet(y, z)
disc = CodifyPoints(UniqueElementsEncoding(X), UniqueElementsEncoding(Y));
est = JointProbabilities(ConditionalEntropyTsallisAbe(q = 1.5), disc);
association(est, X, Y)
1.8293074543081442
ConditionalEntropyTsallisFuruichi
JointProbabilities
+ CodifyVariables
+ UniqueElements
We'll here repeat the analysis we did for ConditionalEntropyShannon
and ConditionalEntropyTsallisAbe
above.
using Associations
using Random; rng = Xoshiro(1234)
n = 1000
rating = rand(rng, 1:6, n)
movie = rand(rng, ["The Witcher: the movie", "Lord of the Rings"], n)
disc = CodifyVariables(UniqueElements())
est = JointProbabilities(ConditionalEntropyTsallisFuruichi(q =0.5), disc)
association(est, rating, movie)
4.087134896624493
JointProbabilities
+ CodifyPoints
+ UniqueElementsEncoding
using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1:5, 100), rand(rng, 1:5, 100), rand(rng, 1:3, 100)
X = StateSpaceSet(x, z)
Y = StateSpaceSet(y, z)
disc = CodifyPoints(UniqueElementsEncoding(X), UniqueElementsEncoding(Y));
est = JointProbabilities(ConditionalEntropyTsallisFuruichi(q = 0.5), disc);
association(est, X, Y)
6.6943366123303045
MIShannon
JointProbabilities
+ ValueBinning
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 1000)
y = rand(rng, 1000)
discretization = CodifyVariables(ValueBinning(FixedRectangularBinning(0, 1, 5)))
est = JointProbabilities(MIShannon(), discretization)
association(est, x, y)
0.0026305563117749483
JointProbabilities
+ UniqueElements
The JointProbabilities
estimator can also be used with categorical data. For example, let's compare the Shannon mutual information between the preferences of a population sample with regards to different foods.
using Associations
n = 1000
preferences = rand(["neutral", "like it", "hate it"], n);
random_foods = rand(["water", "flour", "bananas", "booze", "potatoes", "beans", "soup"], n)
biased_foods = map(preferences) do preference
if cmp(preference, "neutral") == 1
return rand(["water", "flour"])
elseif cmp(preference, "like it") == 1
return rand(["bananas", "booze"])
else
return rand(["potatoes", "beans", "soup"])
end
end
est = JointProbabilities(MIShannon(), UniqueElements())
association(est, preferences, biased_foods), association(est, preferences, random_foods)
(0.9202998866042832, 0.009379638426881895)
Dedicated GaussianMI
estimator
using Associations
using Distributions
using Statistics
n = 1000
using Associations
x = randn(1000)
y = rand(1000) .+ x
association(GaussianMI(MIShannon()), x, y) # defaults to `MIShannon()`
1.8434735588949658
Dedicated KraskovStögbauerGrassberger1
estimator
using Associations
x, y = rand(1000), rand(1000)
association(KSG1(MIShannon(); k = 5), x, y)
-0.05615849407981694
Dedicated KraskovStögbauerGrassberger2
estimator
using Associations
x, y = rand(1000), rand(1000)
association(KSG2(MIShannon(); k = 5), x, y)
-0.01874060526242381
Dedicated GaoKannanOhViswanath
estimator
using Associations
x, y = rand(1000), rand(1000)
association(GaoKannanOhViswanath(MIShannon(); k = 10), x, y)
0.02772673069868732
EntropyDecomposition
+ Kraskov
We can compute MIShannon
by naively applying a DifferentialInfoEstimator
. Note that this doesn't apply any bias correction.
using Associations
x, y = rand(1000), rand(1000)
association(EntropyDecomposition(MIShannon(), Kraskov(k = 3)), x, y)
-0.08648156338723614
EntropyDecomposition
+ BubbleSortSwaps
We can also compute MIShannon
by naively applying a DiscreteInfoEstimator
. Note that this doesn't apply any bias correction.
using Associations
x, y = rand(1000), rand(1000)
disc = CodifyVariables(BubbleSortSwaps(m=5))
hest = PlugIn(Shannon())
association(EntropyDecomposition(MIShannon(), hest, disc), x, y)
0.07642338513217162
EntropyDecomposition
+ Jackknife
+ ValueBinning
Shannon mutual information can be written as a sum of marginal entropy terms. Here, we use CodifyVariables
with ValueBinning
bin the data and compute discrete Shannon mutual information.
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 50)
y = rand(rng, 50)
# Use the H3-estimation method with a discrete visitation frequency based
# probabilities estimator over a fixed grid covering the range of the data,
# which is on [0, 1].
discretization = CodifyVariables(ValueBinning(FixedRectangularBinning(0, 1, 5)))
hest = Jackknife(Shannon())
est = EntropyDecomposition(MIShannon(), hest, discretization)
association(est, x, y)
-0.03746574786856627
Reproducing Kraskov et al. (2004)
Here, we'll reproduce Figure 4 from Kraskov et al. (2004)'s seminal paper on the nearest-neighbor based mutual information estimator. We'll estimate the mutual information between marginals of a bivariate Gaussian for a fixed time series length of 1000, varying the number of neighbors. Note: in the original paper, they show multiple curves corresponding to different time series length. We only show two single curves: one for the KraskovStögbauerGrassberger1
estimator and one for the KraskovStögbauerGrassberger2
estimator.
using Associations
using LinearAlgebra: det
using Distributions: MvNormal
using StateSpaceSets: StateSpaceSet
using CairoMakie
using Statistics
N = 800
c = 0.9
Σ = [1 c; c 1]
N2 = MvNormal([0, 0], Σ)
mitrue = -0.5*log(det(Σ)) # in nats
ks = [2; 5; 7; 10:10:70] .* 2
nreps = 10 # plot average over 10 independent realizations
mis_ksg1 = zeros(nreps, length(ks))
mis_ksg2 = zeros(nreps, length(ks))
for i = 1:nreps
D2 = StateSpaceSet([rand(N2) for i = 1:N])
X = D2[:, 1] |> StateSpaceSet
Y = D2[:, 2] |> StateSpaceSet
for (j, k) in enumerate(ks)
est1 = KSG1(MIShannon(; base = ℯ); k)
est2 = KSG2(MIShannon(; base = ℯ); k)
mis_ksg1[i, j] = association(est1, X, Y)
mis_ksg2[i, j] = association(est2, X, Y)
end
end
fig = Figure()
ax = Axis(fig[1, 1], xlabel = "k / N", ylabel = "Mutual infomation (nats)")
scatterlines!(ax, ks ./ N, mean(mis_ksg1, dims = 1) |> vec, label = "KSG1")
scatterlines!(ax, ks ./ N, mean(mis_ksg2, dims = 1) |> vec, label = "KSG2")
hlines!(ax, [mitrue], color = :black, linewidth = 3, label = "I (true)")
axislegend()
fig
Estimator comparison for MIShannon
Most estimators suffer from significant bias when applied to discrete, finite data. One possible resolution is to add a small amount of noise to discrete variables, so that the data becomes continuous in practice.
But instead of adding noise to your data, you can also consider using an estimator that is specifically designed to deal with continuous-discrete mixture data. One example is the GaoKannanOhViswanath
estimator. Below, we compare its performance to KraskovStögbauerGrassberger1
on uniformly distributed discrete multivariate data. The true mutual information is zero. While the "naive" KraskovStögbauerGrassberger1
estimator diverges from the true value for these data, the GaoKannanOhViswanath
converges to the true value.
using Associations
using Statistics
using StateSpaceSets: StateSpaceSet
using Statistics: mean
using CairoMakie
function compare_ksg_gkov(;
k = 5,
base = 2,
nreps = 10,
Ls = [500:100:1000; 1500; 2500; 5000; 7000])
mis_ksg1_mix = zeros(nreps, length(Ls))
mis_ksg1_discrete = zeros(nreps, length(Ls))
mis_ksg1_cont = zeros(nreps, length(Ls))
mis_gkov_mix = zeros(nreps, length(Ls))
mis_gkov_discrete = zeros(nreps, length(Ls))
mis_gkov_cont = zeros(nreps, length(Ls))
for (j, L) in enumerate(Ls)
for i = 1:nreps
X = StateSpaceSet(float.(rand(1:8, L, 2)))
Y = StateSpaceSet(float.(rand(1:8, L, 2)))
Z = StateSpaceSet(rand(L, 2))
W = StateSpaceSet(rand(L, 2))
est_gkov = GaoKannanOhViswanath(MIShannon(; base = ℯ); k)
est_ksg1 = KSG1(MIShannon(; base = ℯ); k)
mis_ksg1_discrete[i, j] = association(est_ksg1, X, Y)
mis_gkov_discrete[i, j] = association(est_gkov, X, Y)
mis_ksg1_mix[i, j] = association(est_ksg1, X, Z)
mis_gkov_mix[i, j] = association(est_gkov, X, Z)
mis_ksg1_cont[i, j] = association(est_ksg1, Z, W)
mis_gkov_cont[i, j] = association(est_gkov, Z, W)
end
end
return mis_ksg1_mix, mis_ksg1_discrete, mis_ksg1_cont,
mis_gkov_mix, mis_gkov_discrete, mis_gkov_cont
end
fig = Figure()
ax = Axis(fig[1, 1],
xlabel = "Sample size",
ylabel = "Mutual information (bits)")
Ls = [100; 200; 500; 1000; 2500; 5000; 7000]
nreps = 5
k = 3
mis_ksg1_mix, mis_ksg1_discrete, mis_ksg1_cont,
mis_gkov_mix, mis_gkov_discrete, mis_gkov_cont =
compare_ksg_gkov(; nreps, k, Ls)
scatterlines!(ax, Ls, mean(mis_ksg1_mix, dims = 1) |> vec,
label = "KSG1 (mixed)", color = :black,
marker = :utriangle)
scatterlines!(ax, Ls, mean(mis_ksg1_discrete, dims = 1) |> vec,
label = "KSG1 (discrete)", color = :black,
linestyle = :dash, marker = '▲')
scatterlines!(ax, Ls, mean(mis_ksg1_cont, dims = 1) |> vec,
label = "KSG1 (continuous)", color = :black,
linestyle = :dot, marker = '●')
scatterlines!(ax, Ls, mean(mis_gkov_mix, dims = 1) |> vec,
label = "GaoKannanOhViswanath (mixed)", color = :red,
marker = :utriangle)
scatterlines!(ax, Ls, mean(mis_gkov_discrete, dims = 1) |> vec,
label = "GaoKannanOhViswanath (discrete)", color = :red,
linestyle = :dash, marker = '▲')
scatterlines!(ax, Ls, mean(mis_gkov_cont, dims = 1) |> vec,
label = "GaoKannanOhViswanath (continuous)", color = :red,
linestyle = :dot, marker = '●')
axislegend(position = :rb)
fig
Estimation using DifferentialInfoEstimator
s: a comparison
Let's compare the performance of a subset of the implemented mutual information estimators. We'll use example data from Lord et al., where the analytical mutual information is known.
using Associations
using LinearAlgebra: det
using StateSpaceSets: StateSpaceSet
using Distributions: MvNormal
using LaTeXStrings
using CairoMakie
# adapted from https://juliadatascience.io/makie_colors
function new_cycle_theme()
# https://nanx.me/ggsci/reference/pal_locuszoom.html
my_colors = ["#D43F3AFF", "#EEA236FF", "#5CB85CFF", "#46B8DAFF",
"#357EBDFF", "#9632B8FF", "#B8B8B8FF"]
cycle = Cycle([:color, :linestyle, :marker], covary=true) # alltogether
my_markers = [:circle, :rect, :utriangle, :dtriangle, :diamond,
:pentagon, :cross, :xcross]
my_linestyle = [nothing, :dash, :dot, :dashdot, :dashdotdot]
return Theme(
fontsize = 22, font="CMU Serif",
colormap = :linear_bmy_10_95_c78_n256,
palette = (
color = my_colors,
marker = my_markers,
linestyle = my_linestyle,
),
Axis = (
backgroundcolor= (:white, 0.2),
xgridstyle = :dash,
ygridstyle = :dash
),
Lines = (
cycle= cycle,
),
ScatterLines = (
cycle = cycle,
),
Scatter = (
cycle = cycle,
),
Legend = (
bgcolor = (:grey, 0.05),
framecolor = (:white, 0.2),
labelsize = 13,
)
)
end
run(est; f::Function, # function that generates data
base::Real = ℯ,
nreps::Int = 10,
αs = [1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1],
n::Int = 1000) =
map(α -> association(est, f(α, n)...), αs)
function compute_results(f::Function; estimators, k = 5, k_lord = 20,
n = 1000, base = ℯ, nreps = 10,
as = 7:-1:0,
αs = [1/10^(a) for a in as])
is = [zeros(length(αs)) for est in estimators]
for (k, est) in enumerate(estimators)
tmp = zeros(length(αs))
for i = 1:nreps
tmp .+= run(est; f = f, αs, base, n)
end
is[k] .= tmp ./ nreps
end
return is
end
function plot_results(f::Function, ftrue::Function;
base, estimators, k_lord, k,
as = 7:-1:0, αs = [1/10^(a) for a in as], kwargs...
)
is = compute_results(f;
base, estimators, k_lord, k, as, αs, kwargs...)
itrue = [ftrue(α; base) for α in αs]
xmin, xmax = minimum(αs), maximum(αs)
ymin = floor(Int, min(minimum(itrue), minimum(Iterators.flatten(is))))
ymax = ceil(Int, max(maximum(itrue), maximum(Iterators.flatten(is))))
f = Figure()
ax = Axis(f[1, 1],
xlabel = "α", ylabel = "I (nats)",
xscale = log10, aspect = 1,
xticks = (αs, [latexstring("10^{$(-a)}") for a in as]),
yticks = (ymin:ymax)
)
xlims!(ax, (1/10^first(as), 1/10^last(as)))
ylims!(ax, (ymin, ymax))
lines!(ax, αs, itrue,
label = "I (true)", linewidth = 4, color = :black)
for (i, est) in enumerate(estimators)
if est isa EntropyDecomposition
es = typeof(est.est).name.name |> String
else
es = typeof(est).name.name |> String
end
@show es
lbl = occursin("Lord", es) ? "$es (k = $k_lord)" : "$es (k = $k)"
scatter!(ax, αs, is[i], label = lbl)
lines!(ax, αs, is[i])
end
axislegend()
return f
end
set_theme!(new_cycle_theme())
k_lord = 20
k = 5
base = ℯ
def = MIShannon(base = ℯ)
estimators = [
EntropyDecomposition(def, Kraskov(; k)),
EntropyDecomposition(def, KozachenkoLeonenko()),
EntropyDecomposition(def, Zhu(; k)),
EntropyDecomposition(def, ZhuSingh(; k)),
EntropyDecomposition(def, Gao(; k)),
EntropyDecomposition(def, Lord(; k = k_lord)),
EntropyDecomposition(def, LeonenkoProzantoSavani(Shannon(); k)),
KSG1(def; k),
KSG2(def; k),
GaoOhViswanath(def; k),
GaoKannanOhViswanath(def; k),
GaussianMI(def),
];
12-element Vector{MultivariateInformationMeasureEstimator{MIShannon{Irrational{:ℯ}}}}:
EntropyDecomposition{MIShannon{Irrational{:ℯ}}, Kraskov{Shannon{Int64}, Int64}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), Kraskov(definition = Shannon(base = 2), k = 5, w = 0, base = 2), nothing, nothing)
EntropyDecomposition{MIShannon{Irrational{:ℯ}}, KozachenkoLeonenko{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), KozachenkoLeonenko(definition = Shannon(base = 2), w = 0), nothing, nothing)
EntropyDecomposition{MIShannon{Irrational{:ℯ}}, Zhu{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), Zhu(definition = Shannon(base = 2), k = 5, w = 0), nothing, nothing)
EntropyDecomposition{MIShannon{Irrational{:ℯ}}, ZhuSingh{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), ZhuSingh(definition = Shannon(base = 2), k = 5, w = 0), nothing, nothing)
EntropyDecomposition{MIShannon{Irrational{:ℯ}}, Gao{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), Gao(definition = Shannon(base = 2), k = 5, w = 0, corrected = true), nothing, nothing)
EntropyDecomposition{MIShannon{Irrational{:ℯ}}, Lord{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), Lord(definition = Shannon(base = 2), k = 20, w = 0), nothing, nothing)
EntropyDecomposition{MIShannon{Irrational{:ℯ}}, LeonenkoProzantoSavani{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), LeonenkoProzantoSavani(definition = Shannon(base = 2), k = 5, w = 0), nothing, nothing)
KraskovStögbauerGrassberger1{MIShannon{Irrational{:ℯ}}, Chebyshev, Chebyshev}(MIShannon{Irrational{:ℯ}}(ℯ), 5, 0, Chebyshev(), Chebyshev())
KraskovStögbauerGrassberger2{MIShannon{Irrational{:ℯ}}, Chebyshev, Chebyshev}(MIShannon{Irrational{:ℯ}}(ℯ), 5, 0, Chebyshev(), Chebyshev())
GaoOhViswanath{MIShannon{Irrational{:ℯ}}, Euclidean, Euclidean}(MIShannon{Irrational{:ℯ}}(ℯ), 5, 0, Euclidean(0.0), Euclidean(0.0))
GaoKannanOhViswanath{MIShannon{Irrational{:ℯ}}}(MIShannon{Irrational{:ℯ}}(ℯ), 5, 0)
GaussianMI{MIShannon{Irrational{:ℯ}}}(MIShannon{Irrational{:ℯ}}(ℯ), true)
Example system: family 1
In this system, samples are concentrated around the diagonal $X = Y$, and the strip of samples gets thinner as $\alpha \to 0$.
function family1(α, n::Int)
x = rand(n)
v = rand(n)
y = x + α * v
return StateSpaceSet(x), StateSpaceSet(y)
end
# True mutual information values for these data
function ifamily1(α; base = ℯ)
mi = -log(α) - α - log(2)
return mi / log(base, ℯ)
end
fig = plot_results(family1, ifamily1;
k_lord = k_lord, k = k, nreps = 10, n = 800,
estimators = estimators,
base = base)
Example system: family 2
function family2(α, n::Int)
Σ = [1 α; α 1]
N2 = MvNormal(zeros(2), Σ)
D2 = StateSpaceSet([rand(N2) for i = 1:n])
X = StateSpaceSet(D2[:, 1])
Y = StateSpaceSet(D2[:, 2])
return X, Y
end
function ifamily2(α; base = ℯ)
return (-0.5 * log(1 - α^2)) / log(ℯ, base)
end
αs = 0.05:0.05:0.95
estimators = estimators
with_theme(new_cycle_theme()) do
f = Figure();
ax = Axis(f[1, 1], xlabel = "α", ylabel = "I (nats)")
is_true = map(α -> ifamily2(α), αs)
is_est = map(est -> run(est; f = family2, αs, nreps = 20), estimators)
lines!(ax, αs, is_true,
label = "I (true)", color = :black, linewidth = 3)
for (i, est) in enumerate(estimators)
if est isa EntropyDecomposition
estname = typeof(est.est).name.name |> String
else
estname = typeof(est).name.name |> String
end
scatterlines!(ax, αs, is_est[i], label = estname)
end
axislegend(position = :lt)
return f
end
Example system: family 3
In this system, we draw samples from a 4D Gaussian distribution distributed as specified in the ifamily3
function below. We let $X$ be the two first variables, and $Y$ be the two last variables.
function ifamily3(α; base = ℯ)
Σ = [7 -5 -1 -3; -5 5 -1 3; -1 -1 3 -1; -3 3 -1 2+α]
Σx = Σ[1:2, 1:2]; Σy = Σ[3:4, 3:4]
mi = 0.5*log(det(Σx) * det(Σy) / det(Σ))
return mi / log(ℯ, base)
end
function family3(α, n::Int)
Σ = [7 -5 -1 -3; -5 5 -1 3; -1 -1 3 -1; -3 3 -1 2+α]
N4 = MvNormal(zeros(4), Σ)
D4 = StateSpaceSet([rand(N4) for i = 1:n])
X = D4[:, 1:2]
Y = D4[:, 3:4]
return X, Y
end
fig = plot_results(family3, ifamily3;
k_lord = k_lord, k = k, nreps = 5, n = 800,
estimators = estimators, base = base)
We see that the Lord
estimator, which estimates local volume elements using a singular-value decomposition (SVD) of local neighborhoods, outperforms the other estimators by a large margin.
MIRenyiJizba
JointProbabilities
+ UniqueElements
MIRenyiJizba
can be estimated for categorical data using JointProbabilities
estimator with the UniqueElements
outcome space.
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, ["a", "b", "c"], 200);
y = rand(rng, ["hello", "yoyo", "heyhey"], 200);
est = JointProbabilities(MIRenyiJizba(), UniqueElements())
association(est, x, y)
-0.014289937341761379
EntropyDecomposition
+ LeonenkoProzantoSavani
MIRenyiJizba
can also estimated for numerical data using EntropyDecomposition
in combination with any DifferentialInfoEstimator
capable of estimating differential Renyi
entropy.
using Associations
using Random; rng = Xoshiro(1234)
x = randn(rng, 50); y = randn(rng, 50);
def = MIRenyiJizba()
est_diff = EntropyDecomposition(def, LeonenkoProzantoSavani(Renyi(), k=3))
association(est_diff, x, y)
-0.05308726289365895
EntropyDecomposition
+ LeonenkoProzantoSavani
MIRenyiJizba
can also estimated for numerical data using EntropyDecomposition
in combination with any DiscreteInfoEstimator
capable of estimating differential Renyi
entropy over some OutcomeSpace
, e.g. ValueBinning
.
using Associations
using Random; rng = Xoshiro(1234)
x = randn(rng, 50); y = randn(rng, 50);
def = MIRenyiJizba()
disc = CodifyVariables(ValueBinning(2))
est_disc = EntropyDecomposition(def, PlugIn(Renyi()), disc);
association(est_disc, x, y)
0.044138581955657896
MIRenyiSarbu
MIRenyiSarbu
can be estimated using the JointProbabilities
estimator in combination with any CodifyVariables
or CodifyPoints
discretization scheme.
JointProbabilities
+ UniqueElements
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, ["a", "b", "c"], 200)
y = rand(rng, ["hello", "yoyo", "heyhey"], 200)
est = JointProbabilities(MIRenyiSarbu(), CodifyVariables(UniqueElements()))
association(est, x, y)
0.01866831291309387
JointProbabilities
+ CosineSimilarityBinning
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)
est = JointProbabilities(MIRenyiSarbu(), CodifyVariables(CosineSimilarityBinning()))
association(est, x, y)
0.044214606254861136
MITsallisFuruichi
JointProbabilities
+ UniqueElements
MITsallisFuruichi
can be estimated using the JointProbabilities
estimator in combination with any CodifyVariables
or CodifyPoints
discretization scheme.
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)
est = JointProbabilities(MITsallisFuruichi(q = 0.3), UniqueElements())
association(est, x, y)
-2.8721220837828123
EntropyDecomposition
+ LeonenkoProzantoSavani
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)
est_diff = EntropyDecomposition(MITsallisFuruichi(), LeonenkoProzantoSavani(Tsallis(q= 2)))
association(est_diff, x, y)
-0.11358551843093931
EntropyDecomposition
+ Dispersion
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)
disc = CodifyVariables(Dispersion())
est_disc = EntropyDecomposition(MITsallisFuruichi(), PlugIn(Tsallis()), disc)
association(est_disc, x, y)
0.345168809758591
MITsallisMartin
JointProbabilities
+ UniqueElements
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)
est = JointProbabilities(MITsallisMartin(q = 1.5), UniqueElements())
association(est, x, y)
-26.28427124746185
EntropyDecomposition
+ LeonenkoProzantoSavani
MITsallisMartin
can be estimated using a decomposition into entropy terms using EntropyDecomposition
with any compatible estimator that can estimate differential Tsallis
entropy.
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 500)
y = rand(rng, 500)
est_diff = EntropyDecomposition(MITsallisMartin(), LeonenkoProzantoSavani(Tsallis(q= 1.5)))
association(est_diff, x, y)
0.1241248567715774
EntropyDecomposition
+ OrdinalPatterns
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)
disc = CodifyVariables(OrdinalPatterns())
est_disc = EntropyDecomposition(MITsallisMartin(), PlugIn(Tsallis()), disc)
association(est_disc, x, y)
1.3973731206049993
CMIShannon
CMIShannon
with GaussianCMI
using Associations
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = randn(1000)
y = randn(1000) .+ x
z = randn(1000) .+ y
association(GaussianCMI(), x, z, y) # defaults to `CMIShannon()`
0.8364990010190508
CMIShannon
with FPVP
using Associations
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
association(FPVP(k = 5), x, z, y) # defaults to `CMIShannon()`
-0.12173963341855336
CMIShannon
with MesnerShalizi
using Associations
using Distributions
using Statistics
using Random; rng = Xoshiro(1234)
n = 1000
# A chain X → Y → Z
x = rand(rng, Normal(-1, 0.5), n)
y = rand(rng, BetaPrime(0.5, 1.5), n) .+ x
z = rand(rng, Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
association(MesnerShalizi(; k = 10), x, z, y) # defaults to `CMIShannon()`
-0.11407574318482466
CMIShannon
with Rahimzamani
using Associations
using Distributions
using Statistics
using Random; rng = Xoshiro(1234)
n = 1000
# A chain X → Y → Z
x = rand(rng, Normal(-1, 0.5), n)
y = rand(rng, BetaPrime(0.5, 1.5), n) .+ x
z = rand(rng, Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
association(Rahimzamani(CMIShannon(base = 10); k = 10), x, z, y)
-0.0343402204762932
MIDecomposition
Shannon-type conditional mutual information can be decomposed as a sum of mutual information terms, which we can each estimate with any dedicated MutualInformationEstimator
estimator.
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 300)
y = rand(rng, 300) .+ x
z = rand(rng, 300) .+ y
est = MIDecomposition(CMIShannon(), KSG1(MIShannon(base = 2), k = 3))
association(est, x, z, y) # should be near 0 (and can be negative)
CMIRenyiPoczos
PoczosSchneiderCMI
using Associations
using Distributions
using Statistics
using Random; rng = Xoshiro(1234)
n = 1000
# A chain X → Y → Z
x = rand(rng, Normal(-1, 0.5), n)
y = rand(rng, BetaPrime(0.5, 1.5), n) .+ x
z = rand(rng, Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
est = PoczosSchneiderCMI(CMIRenyiPoczos(base = 2, q = 1.2); k = 5)
association(est, x, z, y)
-0.39801425331961154
In addition to the dedicated ConditionalMutualInformationEstimator
s, any MutualInformationEstimator
can also be used to compute conditional mutual information using the chain rule of mutual information. However, the naive application of these estimators don't perform any bias correction when taking the difference of mutual information terms.
CMIShannon
MIDecomposition
+ KraskovStögbauerGrassberger1
using Associations
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
est = MIDecomposition(CMIShannon(base = 2), KSG1(k = 10))
association(est, x, z, y)
-0.4634752409534487
EntropyDecomposition
+ Kraskov
Any DifferentialInfoEstimator
can also be used to compute conditional mutual information using a sum of entropies. For that, we usethe EntropyDecomposition
estimator. No bias correction is applied for EntropyDecomposition
either.
using Associations
using Distributions
using Random; rng = Xoshiro(1234)
n = 500
# A chain X → Y → Z
x = rand(rng, Epanechnikov(0.5, 1.0), n)
y = rand(rng, Normal(0, 0.2), n) .+ x
z = rand(rng, FDist(3, 2), n)
est = EntropyDecomposition(CMIShannon(), Kraskov(k = 5))
association(est, x, z, y)
-0.40513415764994387
Any DiscreteInfoEstimator
that computes entropy can also be used to compute conditional mutual information using a sum of entropies. For that, we also use EntropyDecomposition
. In the discrete case, we also have to specify a discretization (an OutcomeSpace
).
EntropyDecomposition
+ ValueBinning
using Associations
using Distributions
using Random; rng = Xoshiro(1234)
n = 500
# A chain X → Y → Z
x = rand(rng, Epanechnikov(0.5, 1.0), n)
y = rand(rng, Normal(0, 0.2), n) .+ x
z = rand(rng, FDist(3, 2), n)
discretization = CodifyVariables(ValueBinning(RectangularBinning(5)))
hest = PlugIn(Shannon())
est = EntropyDecomposition(CMIShannon(), hest, discretization)
association(est, x, y, z)
0.8238141245350172
CMIRenyiJizba
JointProbabilities
+ BubbleSortSwaps
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 100)
y = x .+ rand(rng, 100)
z = y .+ rand(rng, 100)
disc = CodifyVariables(BubbleSortSwaps(m = 4))
est = JointProbabilities(CMIRenyiJizba(), disc)
association(est, x, z, y)
0.43284279366170564
EntropyDecomposition
+ LeonenkoProzantoSavani
using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1000), rand(rng, 1000), rand(rng, 1000)
def = CMIRenyiJizba(q = 1.5)
# Using a differential Rényi entropy estimator
est = EntropyDecomposition(def, LeonenkoProzantoSavani(Renyi(), k = 10))
association(est, x, y, z)
-0.03274084267284355
EntropyDecomposition
+ OrdinalPatterns
using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1000), rand(rng, 1000), rand(rng, 1000)
def = CMIRenyiJizba(q = 1.5)
# Using a plug-in Rényi entropy estimator, discretizing using ordinal patterns.
est = EntropyDecomposition(def, PlugIn(Renyi()), CodifyVariables(OrdinalPatterns(m=2)), RelativeAmount())
association(est, x, y, z)
3.961338273517079e-6
TEShannon
EntropyDecomposition
+ TransferOperator
For transfer entropy examples, we'll construct some time series for which there is time-delayed forcing between variables.
using Associations
using DynamicalSystemsBase
using StableRNGs
rng = StableRNG(123)
Base.@kwdef struct Logistic4Chain{V, RX, RY, RZ, RW, C1, C2, C3, Σ1, Σ2, Σ3, RNG}
xi::V = [0.1, 0.2, 0.3, 0.4]
rx::RX = 3.9
ry::RY = 3.6
rz::RZ = 3.6
rw::RW = 3.8
c_xy::C1 = 0.4
c_yz::C2 = 0.4
c_zw::C3 = 0.35
σ_xy::Σ1 = 0.05
σ_yz::Σ2 = 0.05
σ_zw::Σ3 = 0.05
rng::RNG = Random.default_rng()
end
function eom_logistic4_chain(u, p::Logistic4Chain, t)
(; xi, rx, ry, rz, rw, c_xy, c_yz, c_zw, σ_xy, σ_yz, σ_zw, rng) = p
x, y, z, w = u
f_xy = (y + c_xy*(x + σ_xy * rand(rng)) ) / (1 + c_xy*(1+σ_xy))
f_yz = (z + c_yz*(y + σ_yz * rand(rng)) ) / (1 + c_yz*(1+σ_yz))
f_zw = (w + c_zw*(z + σ_zw * rand(rng)) ) / (1 + c_zw*(1+σ_zw))
dx = rx * x * (1 - x)
dy = ry * (f_xy) * (1 - f_xy)
dz = rz * (f_yz) * (1 - f_yz)
dw = rw * (f_zw) * (1 - f_zw)
return SVector{4}(dx, dy, dz, dw)
end
function system(definition::Logistic4Chain)
return DiscreteDynamicalSystem(eom_logistic4_chain, definition.xi, definition)
end
# An example system where `X → Y → Z → W`.
sys = system(Logistic4Chain(; rng))
x, y, z, w = columns(first(trajectory(sys, 300, Ttr = 10000)))
precise = true # precise bin edges
discretization = CodifyVariables(TransferOperator(RectangularBinning(2, precise))) #
est_disc_to = EntropyDecomposition(TEShannon(), PlugIn(Shannon()), discretization);
association(est_disc_to, x, y), association(est_disc_to, y, x)
(0.2529054207192689, 0.10426214218657259)
The Shannon-type transfer entropy from x
to y
is stronger than from y
to x
, which is what we expect if x
drives y
.
association(est_disc_to, x, z), association(est_disc_to, x, z, y)
(0.05336193817928536, 0.023756513924308997)
The Shannon-type transfer entropy from x
to z
is stronger than the transfer entropy from x
to z
given y
. This is expected, because x
drives z
through y
, so "conditioning away" the effect of y
should decrease the estimated information transfer.
CMIDecomposition
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 1000)
y = rand(rng, 1000) .+ x
z = rand(rng, 1000) .+ y
# Estimate transfer entropy by representing it as a CMI and using the `FPVP` estimator.
est = CMIDecomposition(TEShannon(base = 2), FPVP(k = 3))
association(est, x, z, y) # should be near 0 (and can be negative)
-0.24216957851401474
SymbolicTransferEntropy
estimator
The SymbolicTransferEntropy
estimator is just a convenience wrapper which utilizes CodifyVariables
with the OrdinalPatterns
outcome space to discretize the input time series before computing transfer entropy.
We'll use coupled time series from the logistic4
system above, where x → y → z → w
. Thus, we expect that the association for the direction x → y
is larger than for y → x
. We also expect an association x → z
, but the association should weaken when conditioning on the intermediate value y
.
using Associations
using DynamicalSystemsBase
using Random; rng = Xoshiro(1234)
sys = system(Logistic4Chain(; rng))
x, y, z, w = columns(first(trajectory(sys, 300, Ttr = 10000)))
est = SymbolicTransferEntropy(m = 5)
association(est, x, y), association(est, y, x), association(est, x, z), association(est, x, z, y)
(0.5174597240325244, 0.24773918384758742, 0.7233610447154118, 0.1322241113229709)
Comparing different estimators
Let's reproduce Figure 4 from Zhu et al. (2015), where they test some dedicated transfer entropy estimators on a bivariate autoregressive system. We will test
- The
Lindner
andZhu1
dedicated transfer entropy estimators, which try to eliminate bias. - The
KraskovStögbauerGrassberger1
estimator, which computes TE naively as a sum of mutual information terms (without guaranteed cancellation of biases for the total sum). - The
Kraskov
estimator, which computes TE naively as a sum of entropy terms (without guaranteed cancellation of biases for the total sum).
using Associations
using CairoMakie
using Statistics
using Distributions: Normal
function model2(n::Int)
𝒩x = Normal(0, 0.1)
𝒩y = Normal(0, 0.1)
x = zeros(n+2)
y = zeros(n+2)
x[1] = rand(𝒩x)
x[2] = rand(𝒩x)
y[1] = rand(𝒩y)
y[2] = rand(𝒩y)
for i = 3:n+2
x[i] = 0.45*sqrt(2)*x[i-1] - 0.9*x[i-2] - 0.6*y[i-2] + rand(𝒩x)
y[i] = 0.6*x[i-2] - 0.175*sqrt(2)*y[i-1] + 0.55*sqrt(2)*y[i-2] + rand(𝒩y)
end
return x[3:end], y[3:end]
end
te_true = 0.42 # eyeball the theoretical value from their Figure 4.
m = TEShannon(embedding = EmbeddingTE(dT = 2, dS = 2), base = ℯ)
estimators = [
Zhu1(m, k = 8),
Lindner(m, k = 8),
MIDecomposition(m, KSG1(k = 8)),
EntropyDecomposition(m, Kraskov(k = 8)),
]
Ls = [floor(Int, 2^i) for i in 8.0:0.5:11]
nreps = 8
tes_xy = [[zeros(nreps) for i = 1:length(Ls)] for e in estimators]
tes_yx = [[zeros(nreps) for i = 1:length(Ls)] for e in estimators]
for (k, est) in enumerate(estimators)
for (i, L) in enumerate(Ls)
for j = 1:nreps
x, y = model2(L);
tes_xy[k][i][j] = association(est, x, y)
tes_yx[k][i][j] = association(est, y, x)
end
end
end
ymin = minimum(map(x -> minimum(Iterators.flatten(Iterators.flatten(x))), (tes_xy, tes_yx)))
estimator_names = ["Zhu1", "Lindner", "KSG1", "Kraskov"]
ls = [:dash, :dot, :dash, :dot]
mr = [:rect, :hexagon, :xcross, :pentagon]
fig = Figure(resolution = (800, 350))
ax_xy = Axis(fig[1,1], xlabel = "Signal length", ylabel = "TE (nats)", title = "x → y")
ax_yx = Axis(fig[1,2], xlabel = "Signal length", ylabel = "TE (nats)", title = "y → x")
for (k, e) in enumerate(estimators)
label = estimator_names[k]
marker = mr[k]
scatterlines!(ax_xy, Ls, mean.(tes_xy[k]); label, marker)
scatterlines!(ax_yx, Ls, mean.(tes_yx[k]); label, marker)
hlines!(ax_xy, [te_true]; xmin = 0.0, xmax = 1.0, linestyle = :dash, color = :black)
hlines!(ax_yx, [te_true]; xmin = 0.0, xmax = 1.0, linestyle = :dash, color = :black)
linkaxes!(ax_xy, ax_yx)
end
axislegend(ax_xy, position = :rb)
fig
Reproducing Schreiber (2000)
Let's try to reproduce the results from Schreiber's original paper (Schreiber, 2000) where he introduced the transfer entropy. We'll here use the JointProbabilities
estimator, discretizing per column of the input data using the CodifyVariables
discretization scheme with the ValueBinning
outcome space.
using Associations
using DynamicalSystemsBase
using CairoMakie
using Statistics
using Random; Random.seed!(12234);
function ulam_system(dx, x, p, t)
f(x) = 2 - x^2
ε = p[1]
dx[1] = f(ε*x[length(dx)] + (1-ε)*x[1])
for i in 2:length(dx)
dx[i] = f(ε*x[i-1] + (1-ε)*x[i])
end
end
ds = DiscreteDynamicalSystem(ulam_system, rand(100) .- 0.5, [0.04])
first(trajectory(ds, 1000; Ttr = 1000));
εs = 0.02:0.02:1.0
te_x1x2 = zeros(length(εs)); te_x2x1 = zeros(length(εs))
# Guess an appropriate bin width of 0.2 for the histogram
disc = CodifyVariables(ValueHistogram(0.2))
est = JointProbabilities(TEShannon(; base = 2), disc)
for (i, ε) in enumerate(εs)
set_parameter!(ds, 1, ε)
tr = first(trajectory(ds, 300; Ttr = 5000))
X1 = tr[:, 1]; X2 = tr[:, 2]
@assert !any(isnan, X1)
@assert !any(isnan, X2)
te_x1x2[i] = association(est, X1, X2)
te_x2x1[i] = association(est, X2, X1)
end
fig = Figure(size = (800, 600))
ax = Axis(fig[1, 1], xlabel = "epsilon", ylabel = "Transfer entropy (bits)")
lines!(ax, εs, te_x1x2, label = "X1 to X2", color = :black, linewidth = 1.5)
lines!(ax, εs, te_x2x1, label = "X2 to X1", color = :red, linewidth = 1.5)
axislegend(ax, position = :lt)
return fig
As expected, transfer entropy from X1
to X2
is higher than from X2
to X1
across parameter values for ε
. But, by our definition of the ulam system, dynamical coupling only occurs from X1
to X2
. The results, however, show nonzero transfer entropy in both directions. What does this mean?
Computing transfer entropy from finite time series introduces bias, and so does any particular choice of entropy estimator used to calculate it. To determine whether a transfer entropy estimate should be trusted, we can employ surrogate testing. We'll generate surrogate using TimeseriesSurrogates.jl. One possible way to do so is to use a SurrogateAssociationTest
with independence
, but here we'll do the surrogate resampling manually, so we can plot and inspect the results.
In the example below, we continue with the same time series generated above. However, at each value of ε
, we also compute transfer entropy for nsurr = 50
different randomly shuffled (permuted) versions of the source process. If the original transfer entropy exceeds that of some percentile the transfer entropy estimates of the surrogate ensemble, we will take that as "significant" transfer entropy.
nsurr = 25 # in real applications, you should use more surrogates
base = 2
te_x1x2 = zeros(length(εs)); te_x2x1 = zeros(length(εs))
te_x1x2_surr = zeros(length(εs), nsurr); te_x2x1_surr = zeros(length(εs), nsurr)
# use same bin-width as before
disc = CodifyVariables(ValueHistogram(0.2))
est = JointProbabilities(TEShannon(; base = 2), disc)
for (i, ε) in enumerate(εs)
set_parameter!(ds, 1, ε)
tr = first(trajectory(ds, 300; Ttr = 5000))
X1 = tr[:, 1]; X2 = tr[:, 2]
@assert !any(isnan, X1)
@assert !any(isnan, X2)
te_x1x2[i] = association(est, X1, X2)
te_x2x1[i] = association(est, X2, X1)
s1 = surrogenerator(X1, RandomShuffle()); s2 = surrogenerator(X2, RandomShuffle())
for j = 1:nsurr
te_x1x2_surr[i, j] = association(est, s1(), X2)
te_x2x1_surr[i, j] = association(est, s2(), X1)
end
end
# Compute 95th percentiles of the surrogates for each ε
qs_x1x2 = [quantile(te_x1x2_surr[i, :], 0.95) for i = 1:length(εs)]
qs_x2x1 = [quantile(te_x2x1_surr[i, :], 0.95) for i = 1:length(εs)]
fig = with_theme(theme_minimal(), markersize = 2) do
fig = Figure()
ax = Axis(fig[1, 1], xlabel = "epsilon", ylabel = "Transfer entropy (bits)")
scatterlines!(ax, εs, te_x1x2, label = "X1 to X2", color = :black, linewidth = 1.5)
scatterlines!(ax, εs, qs_x1x2, color = :black, linestyle = :dot, linewidth = 1.5)
scatterlines!(ax, εs, te_x2x1, label = "X2 to X1", color = :red)
scatterlines!(ax, εs, qs_x2x1, color = :red, linestyle = :dot)
axislegend(ax, position = :lt)
return fig
end
fig
The plot above shows the original transfer entropies (solid lines) and the 95th percentile transfer entropies of the surrogate ensembles (dotted lines). As expected, using the surrogate test, the transfer entropies from X1
to X2
are mostly significant (solid black line is above dashed black line). The transfer entropies from X2
to X1
, on the other hand, are mostly not significant (red solid line is below red dotted line).
TERenyiJizba
EntropyDecomposition
+ TransferOperator
We can perform the same type of analysis as above using TERenyiJizba
instead of TEShannon
.
using Associations
using DynamicalSystemsBase
using StableRNGs; rng = StableRNG(123)
# An example system where `X → Y → Z → W`.
sys = system(Logistic4Chain(; rng))
x, y, z, w = columns(first(trajectory(sys, 300, Ttr = 10000)))
precise = true # precise bin edges
discretization = CodifyVariables(TransferOperator(RectangularBinning(2, precise))) #
est_disc_to = EntropyDecomposition(TERenyiJizba(), PlugIn(Renyi()), discretization);
association(est_disc_to, x, y), association(est_disc_to, y, x)
(0.2551426443387981, 0.11917462376531485)
ConvergentCrossMapping
RandomVectors
estimator
When cross-mapping with the RandomVectors
estimator, a single random subsample of time indices (i.e. not in any particular order) of length l
is drawn for each library size l
, and cross mapping is performed using the embedding vectors corresponding to those time indices.
using Associations
using Random; rng = MersenneTwister(1234)
x, y = randn(rng, 200), randn(rng, 200)
# We'll draw a single sample at each `l ∈ libsizes`. Sampling with replacement is then
# necessary, because our 200-pt timeseries will result in embeddings with
# less than 200 points.
est = RandomVectors(ConvergentCrossMapping(d = 3); libsizes = 50:25:200, replace = true, rng)
crossmap(est, x, y)
7-element Vector{Float64}:
0.17433429514274518
0.5142460610784763
0.4641719985078984
0.5662783237063713
0.6370489394540293
0.6882438178857719
0.7259854477313521
To generate a distribution of cross-map estimates for each l ∈ libsizes
, just call crossmap repeatedly, e.g.
using Associations
using Random; rng = MersenneTwister(1234)
using Statistics
x, y = randn(rng, 300), randn(rng, 300)
def = ConvergentCrossMapping(d = 3)
libsizes = 25:25:200
ρs = [[crossmap(RandomVectors(def; libsizes = L, replace = true, rng), x, y) for i = 1:50] for L in libsizes]
using CairoMakie
f = Figure(); ax = Axis(f[1, 1]);
plot!(ax, libsizes, mean.(ρs))
errorbars!(ax, libsizes, mean.(ρs), std.(ρs))
f
Now, the k
-th element of ρs
contains 80
estimates of the correspondence measure ρ
at library size libsizes[k]
.
RandomSegment
estimator
When cross-mapping with the RandomSegment
estimator, a single random subsample of continguous, ordered time indices of length l
is drawn for each library size l
, and cross mapping is performed using the embedding vectors corresponding to those time indices.
using Associations
using Random; rng = MersenneTwister(1234)
x, y = randn(rng, 200), randn(rng, 200)
# We'll draw a single sample at each `l ∈ libsizes`. We limit the library size to 100,
# because drawing segments of the data longer than half the available data doesn't make
# much sense.
est = RandomSegment(ConvergentCrossMapping(d = 3); libsizes = 50:25:100, rng)
crossmap(est, x, y)
3-element Vector{Float64}:
-0.08620592976464757
-0.08267826453406518
0.0904172072854787
As above, to generate a distribution of cross-map estimates for each l ∈ libsizes
, just call crossmap repeatedly, e.g.
using Associations
using Random; rng = MersenneTwister(1234)
using Statistics
x, y = randn(rng, 200), randn(rng, 200)
def = ConvergentCrossMapping(d = 3)
libsizes = 25:25:100
ρs = [[crossmap(RandomSegment(def; libsizes = L, rng), x, y) for i = 1:50] for L in libsizes]
f = Figure(); ax = Axis(f[1, 1]);
plot!(ax, libsizes, mean.(ρs))
errorbars!(ax, libsizes, mean.(ρs), std.(ρs))
f
PairwiseAsymmetricInference
We repeat the analyses above, but here use the pairwise asymmetric inference algorithm instead of the convergent cross map algorithm.
RandomVectors
estimator
using Associations
using Random; rng = MersenneTwister(1234)
x, y = randn(rng, 300), randn(rng, 300)
# We'll draw a single sample at each `l ∈ libsizes`. Sampling with replacement is then
# necessary, because our 200-pt timeseries will result in embeddings with
# less than 200 points.
est = RandomVectors(PairwiseAsymmetricInference(d = 3); libsizes = 50:25:200, replace = true, rng)
crossmap(est, x, y)
7-element Vector{Float64}:
0.2888296517259285
0.29878693023589387
0.4933507230328284
0.6004837656299017
0.5902032764829309
0.520932297732788
0.523148380266996
To generate a distribution of cross-map estimates for each l ∈ libsizes
, just call crossmap repeatedly, e.g.
using Associations
using Random; rng = MersenneTwister(1234)
using Statistics
x, y = randn(rng, 300), randn(rng,300)
def = PairwiseAsymmetricInference(d = 3)
libsizes = 25:25:200
ρs = [[crossmap(RandomVectors(def; libsizes = L, replace = true, rng), x, y) for i = 1:50] for L in libsizes]
using CairoMakie
f = Figure(); ax = Axis(f[1, 1]);
plot!(ax, libsizes, mean.(ρs))
errorbars!(ax, libsizes, mean.(ρs), std.(ρs))
f
RandomSegment
estimator
using Associations
using Random; rng = MersenneTwister(1234)
x, y = randn(rng, 200), randn(rng, 200)
# We'll draw a single sample at each `l ∈ libsizes`. We limit the library size to 100,
# because drawing segments of the data longer than half the available data doesn't make
# much sense.
est = RandomSegment(PairwiseAsymmetricInference(d = 3); libsizes = 50:25:100, rng)
crossmap(est, x, y)
3-element Vector{Float64}:
-0.2825168824961344
-0.2431318705493036
-0.2830973260463694
As above, to generate a distribution of cross-map estimates for each l ∈ libsizes
, just call crossmap repeatedly, e.g.
using Associations
using Random; rng = MersenneTwister(1234)
using Statistics
x, y = randn(rng, 300), randn(rng, 300)
def = PairwiseAsymmetricInference(d = 3)
libsizes = 25:25:100
ρs = [[crossmap(RandomSegment(def; libsizes = L, rng), x, y) for i = 1:50] for L in libsizes]
using CairoMakie
f = Figure(); ax = Axis(f[1, 1]);
plot!(ax, libsizes, mean.(ρs))
errorbars!(ax, libsizes, mean.(ρs), std.(ρs))
f
MCR
To quantify association by the mean conditional probability of recurrence (MCR), we'll create a chain of variables where X
drives Y
, which in turn drives Z
. We then expect there to be significant detectable association between both X
and Y
, Y
and Z
and also X
and Z
(because Y
transfers information from X
to Z
. We expect the association between X
and Z
to disappear when conditioning on Y
(since we're then "removing the effect" of Y
).
using Associations
using Random; rng = Xoshiro(1234);
x = rand(rng, 300); y = rand(rng, 300) .* sin.(x); z = rand(rng, 300) .* y;
est = MCR(r = 0.5)
association(est, x, y), association(est, x, z), association(est, y, z), association(est, x, z, y)
(0.7711920426340347, 0.7420689143732229, 0.924270403653336, -0.00798860118526612)
The interpretation of the MCR
measure is that if two variables are symmetrically coupled, then the conditional recurrence in both directions is equal. Two variables that are uncoupled are symmetrically coupled (i.e. no coupling). We therefore expect the difference in conditional recurrence to be around zero.
using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 300)
y = rand(rng, 300)
m = MCR(r = 0.5)
Δ = association(m, x, y) - association(m, y, x)
-0.008308662176546688
RMCD
To quantify association by the recurrence measure of conditional dependence (RMCD), we'll create a chain of variables where X
drives Y
, which in turn drives Z
. We then expect there to be significant detectable association between both X
and Y
, Y
and Z
and also X
and Z
(because Y
transfers information from X
to Z
. We expect the association between X
and Z
to disappear when conditioning on Y
(since we're then "removing the effect" of Y
).
using Associations
using Random; rng = Xoshiro(1234);
x = rand(rng, 300); y = rand(rng, 300) .* sin.(x); z = rand(rng, 300) .* y;
est = RMCD(r = 0.5)
association(est, x, y), association(est, x, z), association(est, x, z, y)
(0.022387457155671135, 0.022129475954601702, 0.0)
ChatterjeeCorrelation
using Associations
using Random; rng = Xoshiro(1234);
x = rand(rng, 120)
y = rand(rng, 120) .* sin.(x)
120-element Vector{Float64}:
0.2755719299262952
0.2070408113032807
0.6193365054669813
0.008237347924455006
0.04148542594436429
0.313817853101071
0.6257653008229676
0.43134904735215984
0.6829986456858073
0.3095824205696399
⋮
0.02169777371442121
0.7764979588125409
0.1660944201176525
0.10967358670001909
0.03230334731057383
0.026337556651936055
0.11124764808654403
0.2599193572475318
0.4628271687638507
By construction, there will almust surely be no ties in x
, so we can use the fast estimate by setting handle_ties == false
.
association(ChatterjeeCorrelation(handle_ties = false), x, y)
0.2726578234599625
If we have data where we know there are ties in the data, then we should set handle_ties == true
.
w = rand(rng, 1:10, 120) # there will be some ties
z = rand(rng, 1:15, 120) .* sin.(w) # introduce some dependence
association(ChatterjeeCorrelation(handle_ties = true), w, z)
0.6016268641151319
AzadkiaChatterjeeCoefficient
using Associations
using Random; rng = Xoshiro(1234);
x = rand(rng, 120)
y = rand(rng, 120) .* x
z = rand(rng, 120) .+ y
120-element Vector{Float64}:
0.44737499407485565
0.5516487384954861
0.7581708420215838
0.23074778575472404
0.2500819453735219
0.7353227843349248
1.1500496733900634
0.7621477984126711
0.851799327542775
0.5247237075962297
⋮
0.3943206855229302
1.838156393683287
0.9709273173395769
0.23333040123923995
0.26781788473231355
0.40232936334694847
0.26839516685160714
1.0709803535460094
0.8668408207007512
For the variables above, where x → y → z
, we expect stronger assocation between x
and y
than between x
and z
. We also expect the strength of the association between x
and z
to drop when conditioning on y
, because y
is the variable that connects x
and z
.
m = AzadkiaChatterjeeCoefficient(theiler = 0) # only exclude self-neighbors
association(m, x, y), association(m, x, z), association(m, x, z, y)
(0.3062018195708035, 0.06951871657754011, 0.010810810810810811)