Examples of association measure estimation

`HellingerDistance`

From precomputed probabilities

using Associations
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(HellingerDistance(), p1, p2)

0.055856605640373105

`JointProbabilities` + `OrdinalPatterns`

We expect the Hellinger distance between two uncorrelated variables to be close to zero.

using Associations
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(HellingerDistance(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero

3.1541797712493756e-6

`KLDivergence`

From precomputed probabilities

using Associations
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(KLDivergence(), p1, p2)

0.20998654701098748

`JointProbabilities` + `OrdinalPatterns`

We expect the KLDivergence between two uncorrelated variables to be close to zero.

using Associations
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(KLDivergence(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero

1.2875933381844337e-5

`RenyiDivergence`

From precomputed probabilities

using Associations
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(RenyiDivergence(), p1, p2)

0.11627470204599202

`JointProbabilities` + `OrdinalPatterns`

We expect the RenyiDivergence between two uncorrelated variables to be close to zero.

using Associations
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(RenyiDivergence(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero

6.4354135893600935e-6

`VariationDistance`

From precomputed probabilities

using Associations
# From pre-computed PMFs
p1 = Probabilities([0.1, 0.5, 0.2, 0.2])
p2 = Probabilities([0.3, 0.3, 0.2, 0.2])
association(VariationDistance(), p1, p2)

0.2

`JointProbabilities` + `OrdinalPatterns`

We expect the VariationDistance between two uncorrelated variables to be close to zero.

using Associations
using Random; rng = Xoshiro(1234)
n = 100000
x, y = rand(rng, n), rand(rng, n)
est = JointProbabilities(VariationDistance(), CodifyVariables(OrdinalPatterns(m=3)))
div_hd = association(est, x, y) # pretty close to zero

0.001530030600612034

`JointEntropyShannon`

`JointProbabilities` with `Dispersion`

using Associations
using Random; rng = Xoshiro(1234)
x, y = rand(rng, 100), rand(rng, 100)
measure = JointEntropyShannon()
discretization = CodifyVariables(Dispersion(m = 2, c = 3))
est = JointProbabilities(measure, discretization)
association(est, x, y)

3.115303256966459

`JointEntropyTsallis`

`JointProbabilities` with `OrdinalPatterns`

using Associations
using Random; rng = Xoshiro(1234)
x, y = rand(rng, 100), rand(rng, 100)
measure = JointEntropyTsallis()
discretization = CodifyVariables(OrdinalPatterns(m = 3))
est = JointProbabilities(measure, discretization)
association(est, x, y)

2.3340683377759546

`JointEntropyRenyi`

`JointProbabilities` with `OrdinalPatterns`

using Associations
using Random; rng = Xoshiro(1234)
x, y = rand(rng, 100), rand(rng, 100)
measure = JointEntropyRenyi(q = 0.5)
discretization = CodifyVariables(ValueBinning(2))
est = JointProbabilities(measure, discretization)
association(est, x, y)

1.9472483134811365

This is essentially example 2.2.1 in Cover & Thomas (2006), where they use the following relative frequency table as an example. Notethat Julia is column-major, so we need to transpose their example. Then their X is in the first dimension of our table (along columns) and their Y is our second dimension (rows).

using Associations
freqs_yx = [1//8 1//16 1//32 1//32;
    1//16 1//8  1//32 1//32;
    1//16 1//16 1//16 1//16;
    1//4  0//1  0//1  0//1];
# `freqs_yx` is already normalized, se we can feed it directly to `Probabilities`
pxy = Probabilities(freqs_yx)

 4×4 Probabilities{Rational{Int64},2}
                Outcome(1)     Outcome(2)     Outcome(3)     Outcome(4)
 Outcome(1)             1//8           1//16          1//32          1//32
 Outcome(2)             1//16          1//8           1//32          1//32
 Outcome(3)             1//16          1//16          1//16          1//16
 Outcome(4)             1//4           0//1           0//1           0//1

The marginal distribution for x (first dimension) is

marginal(pxy, dims = 2)

 Probabilities{Rational{Int64},1} over 4 outcomes
 Outcome(1)  1//2
 Outcome(2)  1//4
 Outcome(3)  1//8
 Outcome(4)  1//8

The marginal distribution for y (second dimension) is

marginal(pxy, dims = 1)

 Probabilities{Rational{Int64},1} over 4 outcomes
 Outcome(1)  1//4
 Outcome(2)  1//4
 Outcome(3)  1//4
 Outcome(4)  1//4

And the Shannon conditional entropy $H^S(X | Y)$

ce_x_given_y = association(ConditionalEntropyShannon(), pxy) |> Rational

13//8

This is the same as in their example. Hooray! To compute $H^S(Y | X)$, we just need to flip the contingency matrix.

pyx = Probabilities(transpose(freqs_yx))
ce_y_given_x = association(ConditionalEntropyShannon(), pyx) |> Rational

11//8

`JointProbabilities` + `CodifyVariables` + `UniqueElements`

We can of course also estimate conditional entropy from data. To do so, we'll use the JointProbabilities estimator, which constructs a multivariate PMF for us. Thus, we don't explicitly need a set of counts, like in the example above, because they are estimated under the hood for us.

Let's first demonstrate on some categorical data. For that, we must use UniqueElements as the discretization (i.e. just count unique elements).

using Associations
using Random; rng = Xoshiro(1234)
n = 1000
rating = rand(rng, 1:6, n)
movie = rand(rng, ["The Witcher: the movie", "Lord of the Rings"], n)

disc = CodifyVariables(UniqueElements())
est = JointProbabilities(ConditionalEntropyShannon(), disc)
association(est, rating, movie)

2.5744931842297527

`JointProbabilities` + `CodifyPoints` + `UniqueElementsEncoding`

using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1:5, 100), rand(rng, 1:5, 100), rand(rng, 1:3, 100)
X = StateSpaceSet(x, z)
Y = StateSpaceSet(y, z)
disc = CodifyPoints(UniqueElementsEncoding(X), UniqueElementsEncoding(Y));
est = JointProbabilities(ConditionalEntropyShannon(), disc);
association(est, X, Y)

1.8234663200044612

`ConditionalEntropyTsallisAbe`

`JointProbabilities` + `CodifyVariables` + `UniqueElements`

We'll here repeat the analysis we did for ConditionalEntropyShannon above.

using Associations
using Random; rng = Xoshiro(1234)
n = 1000
rating = rand(rng, 1:6, n)
movie = rand(rng, ["The Witcher: the movie", "Lord of the Rings"], n)

disc = CodifyVariables(UniqueElements())
est = JointProbabilities(ConditionalEntropyTsallisAbe(q =1.5), disc)
association(est, rating, movie)

1.7608975191987752

`JointProbabilities` + `CodifyPoints` + `UniqueElementsEncoding`

using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1:5, 100), rand(rng, 1:5, 100), rand(rng, 1:3, 100)
X = StateSpaceSet(x, z)
Y = StateSpaceSet(y, z)
disc = CodifyPoints(UniqueElementsEncoding(X), UniqueElementsEncoding(Y));
est = JointProbabilities(ConditionalEntropyTsallisAbe(q = 1.5), disc);
association(est, X, Y)

1.8293074543081442

`ConditionalEntropyTsallisFuruichi`

`JointProbabilities` + `CodifyVariables` + `UniqueElements`

We'll here repeat the analysis we did for ConditionalEntropyShannon and ConditionalEntropyTsallisAbe above.

using Associations
using Random; rng = Xoshiro(1234)
n = 1000
rating = rand(rng, 1:6, n)
movie = rand(rng, ["The Witcher: the movie", "Lord of the Rings"], n)

disc = CodifyVariables(UniqueElements())
est = JointProbabilities(ConditionalEntropyTsallisFuruichi(q =0.5), disc)
association(est, rating, movie)

4.087134896624493

`JointProbabilities` + `CodifyPoints` + `UniqueElementsEncoding`

using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1:5, 100), rand(rng, 1:5, 100), rand(rng, 1:3, 100)
X = StateSpaceSet(x, z)
Y = StateSpaceSet(y, z)
disc = CodifyPoints(UniqueElementsEncoding(X), UniqueElementsEncoding(Y));
est = JointProbabilities(ConditionalEntropyTsallisFuruichi(q = 0.5), disc);
association(est, X, Y)

6.6943366123303045

`MIShannon`

`JointProbabilities` + `ValueBinning`

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 1000)
y = rand(rng, 1000)
discretization = CodifyVariables(ValueBinning(FixedRectangularBinning(0, 1, 5)))
est = JointProbabilities(MIShannon(), discretization)
association(est, x, y)

0.003096460803260294

`JointProbabilities` + `UniqueElements`

The JointProbabilities estimator can also be used with categorical data. For example, let's compare the Shannon mutual information between the preferences of a population sample with regards to different foods.

using Associations
n = 1000
preferences = rand(["neutral", "like it", "hate it"], n);
random_foods = rand(["water", "flour", "bananas", "booze", "potatoes", "beans", "soup"], n)
biased_foods = map(preferences) do preference
    if cmp(preference, "neutral") == 1
        return rand(["water", "flour"])
    elseif cmp(preference, "like it") == 1
        return rand(["bananas", "booze"])
    else
        return rand(["potatoes", "beans", "soup"])
    end
end

est = JointProbabilities(MIShannon(), UniqueElements())
association(est, preferences, biased_foods), association(est, preferences, random_foods)

(0.9035266564823072, 0.01304436347224865)

Dedicated `GaussianMI` estimator

using Associations
using Distributions
using Statistics

n = 1000
using Associations
x = randn(1000)
y = rand(1000) .+ x
association(GaussianMI(MIShannon()), x, y) # defaults to `MIShannon()`

1.866550039012755

Dedicated `KraskovStögbauerGrassberger2` estimator

using Associations
x, y = rand(1000), rand(1000)
association(KSG1(MIShannon(); k = 5), x, y)

-0.07752470141426843

Dedicated `KraskovStögbauerGrassberger2` estimator

using Associations
x, y = rand(1000), rand(1000)
association(KSG2(MIShannon(); k = 5), x, y)

-0.04178614608775437

Dedicated `GaoKannanOhViswanath` estimator

using Associations
x, y = rand(1000), rand(1000)
association(GaoKannanOhViswanath(MIShannon(); k = 10), x, y)

-0.0058145710531963355

`EntropyDecomposition` + `Kraskov`

We can compute MIShannon by naively applying a DifferentialInfoEstimator. Note that this doesn't apply any bias correction.

using Associations
x, y = rand(1000), rand(1000)
association(EntropyDecomposition(MIShannon(), Kraskov(k = 3)), x, y)

-0.015097119033813798

`EntropyDecomposition` + `BubbleSortSwaps`

We can also compute MIShannon by naively applying a DiscreteInfoEstimator. Note that this doesn't apply any bias correction.

using Associations
x, y = rand(1000), rand(1000)
disc = CodifyVariables(BubbleSortSwaps(m=5))
hest = PlugIn(Shannon())
association(EntropyDecomposition(MIShannon(), hest, disc), x, y)

0.06500858664420583

`EntropyDecomposition` + `Jackknife` + `ValueBinning`

Shannon mutual information can be written as a sum of marginal entropy terms. Here, we use CodifyVariables with ValueBinning bin the data and compute discrete Shannon mutual information.

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 50)
y = rand(rng, 50)

# Use the H3-estimation method with a discrete visitation frequency based
# probabilities estimator over a fixed grid covering the range of the data,
# which is on [0, 1].
discretization = CodifyVariables(ValueBinning(FixedRectangularBinning(0, 1, 5)))
hest = Jackknife(Shannon())
est = EntropyDecomposition(MIShannon(), hest, discretization)
association(est, x, y)

-0.14432478545988658

Reproducing Kraskov et al. (2004)

Here, we'll reproduce Figure 4 from Kraskov et al. (2004)'s seminal paper on the nearest-neighbor based mutual information estimator. We'll estimate the mutual information between marginals of a bivariate Gaussian for a fixed time series length of 1000, varying the number of neighbors. Note: in the original paper, they show multiple curves corresponding to different time series length. We only show two single curves: one for the KraskovStögbauerGrassberger2 estimator and one for the KraskovStögbauerGrassberger2 estimator.

using Associations
using LinearAlgebra: det
using Distributions: MvNormal
using StateSpaceSets: StateSpaceSet
using CairoMakie
using Statistics

N = 800
c = 0.9
Σ = [1 c; c 1]
N2 = MvNormal([0, 0], Σ)
mitrue = -0.5*log(det(Σ)) # in nats
ks = [2; 5; 7; 10:10:70] .* 2

nreps = 10 # plot average over 10 independent realizations
mis_ksg1 = zeros(nreps, length(ks))
mis_ksg2 = zeros(nreps, length(ks))
for i = 1:nreps
    D2 = StateSpaceSet([rand(N2) for i = 1:N])
    X = D2[:, 1] |> StateSpaceSet
    Y = D2[:, 2] |> StateSpaceSet
    for (j, k) in enumerate(ks)
        est1 = KSG1(MIShannon(; base = ℯ); k)
        est2 = KSG2(MIShannon(; base = ℯ); k)
        mis_ksg1[i, j] = association(est1, X, Y)
        mis_ksg2[i, j] = association(est2, X, Y)
    end
end
fig = Figure()
ax = Axis(fig[1, 1], xlabel = "k / N", ylabel = "Mutual infomation (nats)")
scatterlines!(ax, ks ./ N, mean(mis_ksg1, dims = 1) |> vec, label = "KSG1")
scatterlines!(ax, ks ./ N, mean(mis_ksg2, dims = 1) |> vec, label = "KSG2")
hlines!(ax, [mitrue], color = :black, linewidth = 3, label = "I (true)")
axislegend()
fig

Estimator comparison for `MIShannon`

Most estimators suffer from significant bias when applied to discrete, finite data. One possible resolution is to add a small amount of noise to discrete variables, so that the data becomes continuous in practice.

But instead of adding noise to your data, you can also consider using an estimator that is specifically designed to deal with continuous-discrete mixture data. One example is the GaoKannanOhViswanath estimator. Below, we compare its performance to KraskovStögbauerGrassberger2 on uniformly distributed discrete multivariate data. The true mutual information is zero. While the "naive" KraskovStögbauerGrassberger2 estimator diverges from the true value for these data, the GaoKannanOhViswanath converges to the true value.

using Associations
using Statistics
using StateSpaceSets: StateSpaceSet
using Statistics: mean
using CairoMakie

function compare_ksg_gkov(;
        k = 5,
        base = 2,
        nreps = 10,
        Ls = [500:100:1000; 1500; 2500; 5000; 7000])


    mis_ksg1_mix = zeros(nreps, length(Ls))
    mis_ksg1_discrete = zeros(nreps, length(Ls))
    mis_ksg1_cont = zeros(nreps, length(Ls))
    mis_gkov_mix = zeros(nreps, length(Ls))
    mis_gkov_discrete = zeros(nreps, length(Ls))
    mis_gkov_cont = zeros(nreps, length(Ls))

    for (j, L) in enumerate(Ls)
        for i = 1:nreps
            X = StateSpaceSet(float.(rand(1:8, L, 2)))
            Y = StateSpaceSet(float.(rand(1:8, L, 2)))
            Z = StateSpaceSet(rand(L, 2))
            W = StateSpaceSet(rand(L, 2))
            est_gkov = GaoKannanOhViswanath(MIShannon(; base = ℯ); k)
            est_ksg1 = KSG1(MIShannon(; base = ℯ); k)
            mis_ksg1_discrete[i, j] = association(est_ksg1, X, Y)
            mis_gkov_discrete[i, j] = association(est_gkov, X, Y)
            mis_ksg1_mix[i, j] = association(est_ksg1, X, Z)
            mis_gkov_mix[i, j] = association(est_gkov, X, Z)
            mis_ksg1_cont[i, j] = association(est_ksg1, Z, W)
            mis_gkov_cont[i, j] = association(est_gkov, Z, W)
        end
    end
    return mis_ksg1_mix, mis_ksg1_discrete, mis_ksg1_cont,
        mis_gkov_mix, mis_gkov_discrete, mis_gkov_cont
end

fig = Figure()
ax = Axis(fig[1, 1],
    xlabel = "Sample size",
    ylabel = "Mutual information (bits)")
Ls = [100; 200; 500; 1000; 2500; 5000; 7000]
nreps = 5
k = 3
mis_ksg1_mix, mis_ksg1_discrete, mis_ksg1_cont,
    mis_gkov_mix, mis_gkov_discrete, mis_gkov_cont =
    compare_ksg_gkov(; nreps, k, Ls)

scatterlines!(ax, Ls, mean(mis_ksg1_mix, dims = 1) |> vec,
    label = "KSG1 (mixed)", color = :black,
    marker = :utriangle)
scatterlines!(ax, Ls, mean(mis_ksg1_discrete, dims = 1) |> vec,
    label = "KSG1 (discrete)", color = :black,
    linestyle = :dash, marker = '▲')
scatterlines!(ax, Ls, mean(mis_ksg1_cont, dims = 1) |> vec,
    label = "KSG1 (continuous)", color = :black,
    linestyle = :dot, marker = '●')
scatterlines!(ax, Ls, mean(mis_gkov_mix, dims = 1) |> vec,
    label = "GaoKannanOhViswanath (mixed)", color = :red,
    marker = :utriangle)
scatterlines!(ax, Ls, mean(mis_gkov_discrete, dims = 1) |> vec,
    label = "GaoKannanOhViswanath (discrete)", color = :red,
    linestyle = :dash, marker = '▲')
scatterlines!(ax, Ls, mean(mis_gkov_cont, dims = 1) |> vec,
    label = "GaoKannanOhViswanath (continuous)", color = :red,
    linestyle = :dot, marker = '●')
axislegend(position = :rb)
fig

Estimation using `DifferentialInfoEstimator`s: a comparison

Let's compare the performance of a subset of the implemented mutual information estimators. We'll use example data from Lord et al., where the analytical mutual information is known.

using Associations
using LinearAlgebra: det
using StateSpaceSets: StateSpaceSet
using Distributions: MvNormal
using LaTeXStrings
using CairoMakie

# adapted from https://juliadatascience.io/makie_colors
function new_cycle_theme()
    # https://nanx.me/ggsci/reference/pal_locuszoom.html
    my_colors = ["#D43F3AFF", "#EEA236FF", "#5CB85CFF", "#46B8DAFF",
        "#357EBDFF", "#9632B8FF", "#B8B8B8FF"]
    cycle = Cycle([:color, :linestyle, :marker], covary=true) # alltogether
    my_markers = [:circle, :rect, :utriangle, :dtriangle, :diamond,
        :pentagon, :cross, :xcross]
    my_linestyle = [nothing, :dash, :dot, :dashdot, :dashdotdot]
    return Theme(
        fontsize = 22, font="CMU Serif",
        colormap = :linear_bmy_10_95_c78_n256,
        palette = (
            color = my_colors,
            marker = my_markers,
            linestyle = my_linestyle,
        ),
        Axis = (
            backgroundcolor= (:white, 0.2),
            xgridstyle = :dash,
            ygridstyle = :dash
        ),
        Lines = (
            cycle= cycle,
        ),
        ScatterLines = (
            cycle = cycle,
        ),
        Scatter = (
            cycle = cycle,
        ),
        Legend = (
            bgcolor = (:grey, 0.05),
            framecolor = (:white, 0.2),
            labelsize = 13,
        )
    )
end

run(est; f::Function, # function that generates data
        base::Real = ℯ,
        nreps::Int = 10,
        αs = [1e-6, 1e-5, 1e-4, 1e-3, 1e-2, 1e-1],
        n::Int = 1000) =
    map(α -> association(est, f(α, n)...), αs)

function compute_results(f::Function; estimators, k = 5, k_lord = 20,
        n = 1000, base = ℯ, nreps = 10,
        as = 7:-1:0,
        αs = [1/10^(a) for a in as])

    is = [zeros(length(αs)) for est in estimators]
    for (k, est) in enumerate(estimators)
        tmp = zeros(length(αs))
        for i = 1:nreps
            tmp .+= run(est; f = f, αs, base, n)
        end
        is[k] .= tmp ./ nreps
    end

    return is
end

function plot_results(f::Function, ftrue::Function;
        base, estimators, k_lord, k,
        as = 7:-1:0, αs = [1/10^(a) for a in as], kwargs...
    )
    is = compute_results(f;
        base, estimators, k_lord, k, as, αs, kwargs...)
    itrue = [ftrue(α; base) for α in αs]

    xmin, xmax = minimum(αs), maximum(αs)

    ymin = floor(Int, min(minimum(itrue), minimum(Iterators.flatten(is))))
    ymax = ceil(Int, max(maximum(itrue), maximum(Iterators.flatten(is))))
    f = Figure()
    ax = Axis(f[1, 1],
        xlabel = "α", ylabel = "I (nats)",
        xscale = log10, aspect = 1,
        xticks = (αs, [latexstring("10^{$(-a)}") for a in as]),
        yticks = (ymin:ymax)
        )
    xlims!(ax, (1/10^first(as), 1/10^last(as)))
    ylims!(ax, (ymin, ymax))
    lines!(ax, αs, itrue,
        label = "I (true)", linewidth = 4, color = :black)
    for (i, est) in enumerate(estimators)
        if est isa EntropyDecomposition
            es = typeof(est.est).name.name |> String
        else
            es = typeof(est).name.name |> String
        end
        @show es
        lbl = occursin("Lord", es) ? "$es (k = $k_lord)" : "$es (k = $k)"
        scatter!(ax, αs, is[i], label = lbl)
        lines!(ax, αs, is[i])

    end
    axislegend()
    return f
end

set_theme!(new_cycle_theme())
k_lord = 20
k = 5
base = ℯ

def = MIShannon(base = ℯ)
estimators = [
    EntropyDecomposition(def, Kraskov(; k)),
    EntropyDecomposition(def, KozachenkoLeonenko()),
    EntropyDecomposition(def, Zhu(; k)),
    EntropyDecomposition(def, ZhuSingh(; k)),
    EntropyDecomposition(def, Gao(; k)),
    EntropyDecomposition(def, Lord(; k = k_lord)),
    EntropyDecomposition(def, LeonenkoProzantoSavani(Shannon(); k)),
    KSG1(def; k),
    KSG2(def; k),
    GaoOhViswanath(def; k),
    GaoKannanOhViswanath(def; k),
    GaussianMI(def),
];

12-element Vector{MultivariateInformationMeasureEstimator{MIShannon{Irrational{:ℯ}}}}:
 EntropyDecomposition{MIShannon{Irrational{:ℯ}}, Kraskov{Shannon{Int64}, Int64}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), Kraskov(definition = Shannon(base = 2), k = 5, w = 0, base = 2), nothing, nothing)
 EntropyDecomposition{MIShannon{Irrational{:ℯ}}, KozachenkoLeonenko{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), KozachenkoLeonenko(definition = Shannon(base = 2), w = 0), nothing, nothing)
 EntropyDecomposition{MIShannon{Irrational{:ℯ}}, Zhu{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), Zhu(definition = Shannon(base = 2), k = 5, w = 0), nothing, nothing)
 EntropyDecomposition{MIShannon{Irrational{:ℯ}}, ZhuSingh{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), ZhuSingh(definition = Shannon(base = 2), k = 5, w = 0), nothing, nothing)
 EntropyDecomposition{MIShannon{Irrational{:ℯ}}, Gao{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), Gao(definition = Shannon(base = 2), k = 5, w = 0, corrected = true), nothing, nothing)
 EntropyDecomposition{MIShannon{Irrational{:ℯ}}, Lord{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), Lord(definition = Shannon(base = 2), k = 20, w = 0), nothing, nothing)
 EntropyDecomposition{MIShannon{Irrational{:ℯ}}, LeonenkoProzantoSavani{Shannon{Int64}}, Nothing, Nothing}(MIShannon{Irrational{:ℯ}}(ℯ), LeonenkoProzantoSavani(definition = Shannon(base = 2), k = 5, w = 0), nothing, nothing)
 KraskovStögbauerGrassberger1{MIShannon{Irrational{:ℯ}}, Chebyshev, Chebyshev}(MIShannon{Irrational{:ℯ}}(ℯ), 5, 0, Chebyshev(), Chebyshev())
 KraskovStögbauerGrassberger2{MIShannon{Irrational{:ℯ}}, Chebyshev, Chebyshev}(MIShannon{Irrational{:ℯ}}(ℯ), 5, 0, Chebyshev(), Chebyshev())
 GaoOhViswanath{MIShannon{Irrational{:ℯ}}, Euclidean, Euclidean}(MIShannon{Irrational{:ℯ}}(ℯ), 5, 0, Euclidean(0.0), Euclidean(0.0))
 GaoKannanOhViswanath{MIShannon{Irrational{:ℯ}}}(MIShannon{Irrational{:ℯ}}(ℯ), 5, 0)
 GaussianMI{MIShannon{Irrational{:ℯ}}}(MIShannon{Irrational{:ℯ}}(ℯ), true)

Example system: family 1

In this system, samples are concentrated around the diagonal $X = Y$, and the strip of samples gets thinner as $\alpha \to 0$.

function family1(α, n::Int)
    x = rand(n)
    v = rand(n)
    y = x + α * v
    return StateSpaceSet(x), StateSpaceSet(y)
end

# True mutual information values for these data
function ifamily1(α; base = ℯ)
    mi = -log(α) - α - log(2)
    return mi / log(base, ℯ)
end

fig = plot_results(family1, ifamily1;
    k_lord = k_lord, k = k, nreps = 10, n = 800,
    estimators = estimators,
    base = base)

Example system: family 2

function family2(α, n::Int)
    Σ = [1 α; α 1]
    N2 = MvNormal(zeros(2), Σ)
    D2 = StateSpaceSet([rand(N2) for i = 1:n])
    X = StateSpaceSet(D2[:, 1])
    Y = StateSpaceSet(D2[:, 2])
    return X, Y
end

function ifamily2(α; base = ℯ)
    return (-0.5 * log(1 - α^2)) / log(ℯ, base)
end

αs = 0.05:0.05:0.95
estimators = estimators
with_theme(new_cycle_theme()) do
    f = Figure();
    ax = Axis(f[1, 1], xlabel = "α", ylabel = "I (nats)")
    is_true = map(α -> ifamily2(α), αs)
    is_est = map(est -> run(est; f = family2, αs, nreps = 20), estimators)
    lines!(ax, αs, is_true,
        label = "I (true)", color = :black, linewidth = 3)
    for (i, est) in enumerate(estimators)
        if est isa EntropyDecomposition
            estname = typeof(est.est).name.name |> String
        else
            estname = typeof(est).name.name |> String
        end
        scatterlines!(ax, αs, is_est[i], label = estname)
    end
    axislegend(position = :lt)
    return f
end

Example system: family 3

In this system, we draw samples from a 4D Gaussian distribution distributed as specified in the ifamily3 function below. We let $X$ be the two first variables, and $Y$ be the two last variables.

function ifamily3(α; base = ℯ)
    Σ = [7 -5 -1 -3; -5 5 -1 3; -1 -1 3 -1; -3 3 -1 2+α]
    Σx = Σ[1:2, 1:2]; Σy = Σ[3:4, 3:4]
    mi = 0.5*log(det(Σx) * det(Σy) / det(Σ))
    return mi / log(ℯ, base)
end

function family3(α, n::Int)
    Σ = [7 -5 -1 -3; -5 5 -1 3; -1 -1 3 -1; -3 3 -1 2+α]
    N4 = MvNormal(zeros(4), Σ)
    D4 = StateSpaceSet([rand(N4) for i = 1:n])
    X = D4[:, 1:2]
    Y = D4[:, 3:4]
    return X, Y
end

fig = plot_results(family3, ifamily3;
    k_lord = k_lord, k = k, nreps = 5, n = 800,
    estimators = estimators, base = base)

We see that the Lord estimator, which estimates local volume elements using a singular-value decomposition (SVD) of local neighborhoods, outperforms the other estimators by a large margin.

`MIRenyiJizba`

`JointProbabilities` + `UniqueElements`

MIRenyiJizba can be estimated for categorical data using JointProbabilities estimator with the UniqueElements outcome space.

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, ["a", "b", "c"], 200);
y = rand(rng, ["hello", "yoyo", "heyhey"], 200);
est = JointProbabilities(MIRenyiJizba(), UniqueElements())
association(est, x, y)

-0.014289937341761379

`EntropyDecomposition` + `LeonenkoProzantoSavani`

MIRenyiJizba can also estimated for numerical data using EntropyDecomposition in combination with any DifferentialInfoEstimator capable of estimating differential Renyi entropy.

using Associations
using Random; rng = Xoshiro(1234)
x = randn(rng, 50); y = randn(rng, 50);
def = MIRenyiJizba()
est_diff = EntropyDecomposition(def, LeonenkoProzantoSavani(Renyi(), k=3))
association(est_diff, x, y)

-0.05308726289365895

`EntropyDecomposition` + `LeonenkoProzantoSavani`

MIRenyiJizba can also estimated for numerical data using EntropyDecomposition in combination with any DiscreteInfoEstimator capable of estimating differential Renyi entropy over some OutcomeSpace, e.g. ValueBinning.

using Associations
using Random; rng = Xoshiro(1234)
x = randn(rng, 50); y = randn(rng, 50);
def = MIRenyiJizba()

disc = CodifyVariables(ValueBinning(2))
est_disc = EntropyDecomposition(def, PlugIn(Renyi()), disc);
association(est_disc, x, y)

0.044138581955657896

`MIRenyiSarbu`

MIRenyiSarbu can be estimated using the JointProbabilities estimator in combination with any CodifyVariables or CodifyPoints discretization scheme.

`JointProbabilities` + `UniqueElements`

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, ["a", "b", "c"], 200)
y = rand(rng, ["hello", "yoyo", "heyhey"], 200)

est = JointProbabilities(MIRenyiSarbu(), CodifyVariables(UniqueElements()))
association(est, x, y)

0.01866831291309387

`JointProbabilities` + `CosineSimilarityBinning`

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)

est = JointProbabilities(MIRenyiSarbu(), CodifyVariables(CosineSimilarityBinning()))
association(est, x, y)

0.044214606254861136

`MITsallisFuruichi`

`JointProbabilities` + `UniqueElements`

MITsallisFuruichi can be estimated using the JointProbabilities estimator in combination with any CodifyVariables or CodifyPoints discretization scheme.

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)

est = JointProbabilities(MITsallisFuruichi(q = 0.3), UniqueElements())
association(est, x, y)

-2.8721220837828123

`EntropyDecomposition` + `LeonenkoProzantoSavani`

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)

est_diff = EntropyDecomposition(MITsallisFuruichi(), LeonenkoProzantoSavani(Tsallis(q= 2)))
association(est_diff, x, y)

-0.11358551843093931

`EntropyDecomposition` + `Dispersion`

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)
disc = CodifyVariables(Dispersion())
est_disc = EntropyDecomposition(MITsallisFuruichi(), PlugIn(Tsallis()), disc)

association(est_disc, x, y)

0.345168809758591

`MITsallisMartin`

`JointProbabilities` + `UniqueElements`

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)

est = JointProbabilities(MITsallisMartin(q = 1.5), UniqueElements())
association(est, x, y)

-26.28427124746185

`EntropyDecomposition` + `LeonenkoProzantoSavani`

MITsallisMartin can be estimated using a decomposition into entropy terms using EntropyDecomposition. This is done by using estimators from ComplexityMeasures.jl. We can use any compatible InformationMeasureEstimator that can estimate differential Tsallis entropy from ComplexityMeasures.jl.

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 500)
y = rand(rng, 500)

est_diff = EntropyDecomposition(MITsallisMartin(), LeonenkoProzantoSavani(Tsallis(q= 1.5)))
association(est_diff, x, y)

0.1241248567715774

`EntropyDecomposition` + `OrdinalPatterns`

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 200)
y = rand(rng, 200)
disc = CodifyVariables(OrdinalPatterns())
est_disc = EntropyDecomposition(MITsallisMartin(), PlugIn(Tsallis()), disc)

association(est_disc, x, y)

1.3973731206049993

`CMIShannon`

`CMIShannon` with `GaussianCMI`

using Associations
using Distributions
using Statistics

n = 1000
# A chain X → Y → Z
x = randn(1000)
y = randn(1000) .+ x
z = randn(1000) .+ y
association(GaussianCMI(), x, z, y) # defaults to `CMIShannon()`

0.7700158454651391

`CMIShannon` with `FPVP`

using Associations
using Distributions
using Statistics

n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
association(FPVP(k = 5), x, z, y) # defaults to `CMIShannon()`

-0.14006746106709703

`CMIShannon` with `MesnerShalizi`

using Associations
using Distributions
using Statistics
using Random; rng = Xoshiro(1234)

n = 1000
# A chain X → Y → Z
x = rand(rng, Normal(-1, 0.5), n)
y = rand(rng, BetaPrime(0.5, 1.5), n) .+ x
z = rand(rng, Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
association(MesnerShalizi(; k = 10), x, z, y) # defaults to `CMIShannon()`

-0.11407574318482466

`CMIShannon` with `Rahimzamani`

using Associations
using Distributions
using Statistics
using Random; rng = Xoshiro(1234)

n = 1000
# A chain X → Y → Z
x = rand(rng, Normal(-1, 0.5), n)
y = rand(rng, BetaPrime(0.5, 1.5), n) .+ x
z = rand(rng, Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
association(Rahimzamani(CMIShannon(base = 10); k = 10), x, z, y)

-0.0343402204762932

`MIDecomposition`

Shannon-type conditional mutual information can be decomposed as a sum of mutual information terms, which we can each estimate with any dedicated MutualInformationEstimator estimator.

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 300)
y = rand(rng, 300) .+ x
z = rand(rng, 300) .+ y

est = MIDecomposition(CMIShannon(), KSG1(MIShannon(base = 2), k = 3))
association(est, x, z, y) # should be near 0 (and can be negative)

`CMIRenyiPoczos`

`PoczosSchneiderCMI`

using Associations
using Distributions
using Statistics
using Random; rng = Xoshiro(1234)

n = 1000
# A chain X → Y → Z
x = rand(rng, Normal(-1, 0.5), n)
y = rand(rng, BetaPrime(0.5, 1.5), n) .+ x
z = rand(rng, Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
est = PoczosSchneiderCMI(CMIRenyiPoczos(base = 2, q = 1.2); k = 5)
association(est, x, z, y)

-0.39801429256003873

In addition to the dedicated ConditionalMutualInformationEstimators, any MutualInformationEstimator can also be used to compute conditional mutual information using the chain rule of mutual information. However, the naive application of these estimators don't perform any bias correction when taking the difference of mutual information terms.

`CMIShannon`

`MIDecomposition` + `KraskovStögbauerGrassberger2`

using Associations
using Distributions
using Statistics

n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
est = MIDecomposition(CMIShannon(base = 2), KSG1(k = 10))
association(est, x, z, y)

-0.40595822379689694

`ShortExpansionConditionalMutualInformation`

`JointProbabilities` with `CodifyVariables` and `ValueBinning`

using Associations
using Test
using Random; rng = Xoshiro(1234)
n = 20
x = rand(rng, n)
y = randn(rng, n) .+ x .^ 2
z = randn(rng, n) .* y

# An estimator for estimating the SECMI measure
est = JointProbabilities(SECMI(base = 2), CodifyVariables(ValueBinning(3)))
association(est, x, z, y)

0.2398365741064942

`EntropyDecomposition` + `Kraskov`

Any DifferentialInfoEstimator can also be used to compute conditional mutual information using a sum of entropies. For that, we usethe EntropyDecomposition estimator. No bias correction is applied for EntropyDecomposition either.

using Associations
using Distributions
using Random; rng = Xoshiro(1234)
n = 500
# A chain X → Y → Z
x = rand(rng, Epanechnikov(0.5, 1.0), n)
y = rand(rng, Normal(0, 0.2), n) .+ x
z = rand(rng, FDist(3, 2), n)
est = EntropyDecomposition(CMIShannon(), Kraskov(k = 5))
association(est, x, z, y)

-0.40513415764994387

Any DiscreteInfoEstimator that computes entropy can also be used to compute conditional mutual information using a sum of entropies. For that, we also use EntropyDecomposition. In the discrete case, we also have to specify a discretization (an OutcomeSpace).

`EntropyDecomposition` + `ValueBinning`

using Associations
using Distributions
using Random; rng = Xoshiro(1234)
n = 500
# A chain X → Y → Z
x = rand(rng, Epanechnikov(0.5, 1.0), n)
y = rand(rng, Normal(0, 0.2), n) .+ x
z = rand(rng, FDist(3, 2), n)
discretization = CodifyVariables(ValueBinning(RectangularBinning(5)))
hest = PlugIn(Shannon())
est = EntropyDecomposition(CMIShannon(), hest, discretization)
association(est, x, y, z)

0.8238141245350172

`CMIRenyiJizba`

`JointProbabilities` + `BubbleSortSwaps`

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 100)
y = x .+ rand(rng, 100)
z = y .+ rand(rng, 100)
disc = CodifyVariables(BubbleSortSwaps(m = 4))
est = JointProbabilities(CMIRenyiJizba(), disc)
association(est, x, z, y)

0.43284279366170564

`EntropyDecomposition` + `LeonenkoProzantoSavani`

using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1000), rand(rng, 1000), rand(rng, 1000)
def = CMIRenyiJizba(q = 1.5)

# Using a differential Rényi entropy estimator
est = EntropyDecomposition(def, LeonenkoProzantoSavani(Renyi(), k = 10))
association(est, x, y, z)

-0.03274084267284355

`EntropyDecomposition` + `OrdinalPatterns`

using Associations
using Random; rng = Xoshiro(1234)
x, y, z = rand(rng, 1000), rand(rng, 1000), rand(rng, 1000)
def = CMIRenyiJizba(q = 1.5)

# Using a plug-in Rényi entropy estimator, discretizing using ordinal patterns.
est = EntropyDecomposition(def, PlugIn(Renyi()), CodifyVariables(OrdinalPatterns(m=2)), RelativeAmount())
association(est, x, y, z)

3.961338273517079e-6

`TEShannon`

`EntropyDecomposition` + `TransferOperator`

For transfer entropy examples, we'll construct some time series for which there is time-delayed forcing between variables.

using Associations
using DynamicalSystemsBase
using StableRNGs
rng = StableRNG(123)

Base.@kwdef struct Logistic4Chain{V, RX, RY, RZ, RW, C1, C2, C3, Σ1, Σ2, Σ3, RNG}
    xi::V = [0.1, 0.2, 0.3, 0.4]
    rx::RX = 3.9
    ry::RY = 3.6
    rz::RZ = 3.6
    rw::RW = 3.8
    c_xy::C1 = 0.4
    c_yz::C2 = 0.4
    c_zw::C3 = 0.35
    σ_xy::Σ1 = 0.05
    σ_yz::Σ2 = 0.05
    σ_zw::Σ3 = 0.05
    rng::RNG = Random.default_rng()
end

function eom_logistic4_chain(u, p::Logistic4Chain, t)
    (; xi, rx, ry, rz, rw, c_xy, c_yz, c_zw, σ_xy, σ_yz, σ_zw, rng) = p
    x, y, z, w = u
    f_xy = (y +  c_xy*(x + σ_xy * rand(rng)) ) / (1 + c_xy*(1+σ_xy))
    f_yz = (z +  c_yz*(y + σ_yz * rand(rng)) ) / (1 + c_yz*(1+σ_yz))
    f_zw = (w +  c_zw*(z + σ_zw * rand(rng)) ) / (1 + c_zw*(1+σ_zw))
    dx = rx * x * (1 - x)
    dy = ry * (f_xy) * (1 - f_xy)
    dz = rz * (f_yz) * (1 - f_yz)
    dw = rw * (f_zw) * (1 - f_zw)
    return SVector{4}(dx, dy, dz, dw)
end

function system(definition::Logistic4Chain)
    return DiscreteDynamicalSystem(eom_logistic4_chain, definition.xi, definition)
end

# An example system where `X → Y → Z → W`.
sys = system(Logistic4Chain(; rng))
x, y, z, w = columns(first(trajectory(sys, 300, Ttr = 10000)))

precise = true # precise bin edges
discretization = CodifyVariables(TransferOperator(RectangularBinning(2, precise))) #
est_disc_to = EntropyDecomposition(TEShannon(), PlugIn(Shannon()), discretization);
association(est_disc_to, x, y), association(est_disc_to, y, x)

(0.25259115482212113, 0.11013462447660904)

The Shannon-type transfer entropy from x to y is stronger than from y to x, which is what we expect if x drives y.

association(est_disc_to, x, z), association(est_disc_to, x, z, y)

(0.05466427941147756, 0.024434965292184252)

The Shannon-type transfer entropy from x to z is stronger than the transfer entropy from x to z given y. This is expected, because x drives z through y, so "conditioning away" the effect of y should decrease the estimated information transfer.

`CMIDecomposition`

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 1000)
y = rand(rng, 1000) .+ x
z = rand(rng, 1000) .+ y

# Estimate transfer entropy by representing it as a CMI and using the `FPVP` estimator.
est = CMIDecomposition(TEShannon(base = 2), FPVP(k = 3))
association(est, x, z, y) # should be near 0 (and can be negative)

-0.26323146509776046

`SymbolicTransferEntropy` estimator

The SymbolicTransferEntropy estimator is just a convenience wrapper which utilizes CodifyVariableswith the OrdinalPatterns outcome space to discretize the input time series before computing transfer entropy.

We'll use coupled time series from the logistic4 system above, where x → y → z → w. Thus, we expect that the association for the direction x → y is larger than for y → x. We also expect an association x → z, but the association should weaken when conditioning on the intermediate value y.

using Associations
using DynamicalSystemsBase
using Random; rng = Xoshiro(1234)
sys = system(Logistic4Chain(; rng))
x, y, z, w = columns(first(trajectory(sys, 300, Ttr = 10000)))
est = SymbolicTransferEntropy(m = 5)
association(est, x, y), association(est, y, x), association(est, x, z), association(est, x, z, y)

(0.5174597240325244, 0.24773918384758742, 0.7233610447154118, 0.1322241113229709)

Comparing different estimators

Let's reproduce Figure 4 from Zhu et al. (2015), where they test some dedicated transfer entropy estimators on a bivariate autoregressive system. We will test

The Lindner and Zhu1 dedicated transfer entropy estimators, which try to eliminate bias.
The KraskovStögbauerGrassberger2 estimator, which computes TE naively as a sum of mutual information terms (without guaranteed cancellation of biases for the total sum).
The Kraskov estimator, which computes TE naively as a sum of entropy terms (without guaranteed cancellation of biases for the total sum).

using Associations
using CairoMakie
using Statistics
using Distributions: Normal

function model2(n::Int)
    𝒩x = Normal(0, 0.1)
    𝒩y = Normal(0, 0.1)
    x = zeros(n+2)
    y = zeros(n+2)
    x[1] = rand(𝒩x)
    x[2] = rand(𝒩x)
    y[1] = rand(𝒩y)
    y[2] = rand(𝒩y)

    for i = 3:n+2
        x[i] = 0.45*sqrt(2)*x[i-1] - 0.9*x[i-2] - 0.6*y[i-2] + rand(𝒩x)
        y[i] = 0.6*x[i-2] - 0.175*sqrt(2)*y[i-1] + 0.55*sqrt(2)*y[i-2] + rand(𝒩y)
    end
    return x[3:end], y[3:end]
end
te_true = 0.42 # eyeball the theoretical value from their Figure 4.

m = TEShannon(embedding = EmbeddingTE(dT = 2, dS = 2), base = ℯ)
estimators = [
    Zhu1(m, k = 8),
    Lindner(m, k = 8),
    MIDecomposition(m, KSG1(k = 8)),
    EntropyDecomposition(m, Kraskov(k = 8)),
]
Ls = [floor(Int, 2^i) for i in 8.0:0.5:11]
nreps = 8
tes_xy = [[zeros(nreps) for i = 1:length(Ls)] for e in estimators]
tes_yx = [[zeros(nreps) for i = 1:length(Ls)] for e in estimators]
for (k, est) in enumerate(estimators)
    for (i, L) in enumerate(Ls)
        for j = 1:nreps
            x, y = model2(L);
            tes_xy[k][i][j] = association(est, x, y)
            tes_yx[k][i][j] = association(est, y, x)
        end
    end
end

ymin = minimum(map(x -> minimum(Iterators.flatten(Iterators.flatten(x))), (tes_xy, tes_yx)))
estimator_names = ["Zhu1", "Lindner", "KSG1", "Kraskov"]
ls = [:dash, :dot, :dash, :dot]
mr = [:rect, :hexagon, :xcross, :pentagon]

fig = Figure(resolution = (800, 350))
ax_xy = Axis(fig[1,1], xlabel = "Signal length", ylabel = "TE (nats)", title = "x → y")
ax_yx = Axis(fig[1,2], xlabel = "Signal length", ylabel = "TE (nats)", title = "y → x")
for (k, e) in enumerate(estimators)
    label = estimator_names[k]
    marker = mr[k]
    scatterlines!(ax_xy, Ls, mean.(tes_xy[k]); label, marker)
    scatterlines!(ax_yx, Ls, mean.(tes_yx[k]); label, marker)
    hlines!(ax_xy, [te_true]; xmin = 0.0, xmax = 1.0, linestyle = :dash, color = :black)
    hlines!(ax_yx, [te_true]; xmin = 0.0, xmax = 1.0, linestyle = :dash, color = :black)
    linkaxes!(ax_xy, ax_yx)
end
axislegend(ax_xy, position = :rb)

fig

Reproducing Schreiber (2000)

Let's try to reproduce the results from Schreiber's original paper (Schreiber, 2000) where he introduced the transfer entropy. We'll here use the JointProbabilities estimator, discretizing per column of the input data using the CodifyVariables discretization scheme with the ValueBinning outcome space.

using Associations
using DynamicalSystemsBase
using CairoMakie
using Statistics
using Random; Random.seed!(12234);

function ulam_system(dx, x, p, t)
    f(x) = 2 - x^2
    ε = p[1]
    dx[1] = f(ε*x[length(dx)] + (1-ε)*x[1])
    for i in 2:length(dx)
        dx[i] = f(ε*x[i-1] + (1-ε)*x[i])
    end
end

ds = DiscreteDynamicalSystem(ulam_system, rand(100) .- 0.5, [0.04])
first(trajectory(ds, 1000; Ttr = 1000));

εs = 0.02:0.02:1.0
te_x1x2 = zeros(length(εs)); te_x2x1 = zeros(length(εs))
# Guess an appropriate bin width of 0.2 for the histogram
disc = CodifyVariables(ValueHistogram(0.2))
est = JointProbabilities(TEShannon(; base = 2), disc)

for (i, ε) in enumerate(εs)
    set_parameter!(ds, 1, ε)
    tr = first(trajectory(ds, 300; Ttr = 5000))
    X1 = tr[:, 1]; X2 = tr[:, 2]
    @assert !any(isnan, X1)
    @assert !any(isnan, X2)
    te_x1x2[i] = association(est, X1, X2)
    te_x2x1[i] = association(est, X2, X1)
end

fig = Figure(size = (800, 600))
ax = Axis(fig[1, 1], xlabel = "epsilon", ylabel = "Transfer entropy (bits)")
lines!(ax, εs, te_x1x2, label = "X1 to X2", color = :black, linewidth = 1.5)
lines!(ax, εs, te_x2x1, label = "X2 to X1", color = :red, linewidth = 1.5)
axislegend(ax, position = :lt)
return fig

As expected, transfer entropy from X1 to X2 is higher than from X2 to X1 across parameter values for ε. But, by our definition of the ulam system, dynamical coupling only occurs from X1 to X2. The results, however, show nonzero transfer entropy in both directions. What does this mean?

Computing transfer entropy from finite time series introduces bias, and so does any particular choice of entropy estimator used to calculate it. To determine whether a transfer entropy estimate should be trusted, we can employ surrogate testing. We'll generate surrogate using TimeseriesSurrogates.jl. One possible way to do so is to use a SurrogateAssociationTest with independence, but here we'll do the surrogate resampling manually, so we can plot and inspect the results.

In the example below, we continue with the same time series generated above. However, at each value of ε, we also compute transfer entropy for nsurr = 50 different randomly shuffled (permuted) versions of the source process. If the original transfer entropy exceeds that of some percentile the transfer entropy estimates of the surrogate ensemble, we will take that as "significant" transfer entropy.

nsurr = 25 # in real applications, you should use more surrogates
base = 2
te_x1x2 = zeros(length(εs)); te_x2x1 = zeros(length(εs))
te_x1x2_surr = zeros(length(εs), nsurr); te_x2x1_surr = zeros(length(εs), nsurr)

# use same bin-width as before
disc = CodifyVariables(ValueHistogram(0.2))
est = JointProbabilities(TEShannon(; base = 2), disc)

for (i, ε) in enumerate(εs)
    set_parameter!(ds, 1, ε)
    tr = first(trajectory(ds, 300; Ttr = 5000))
    X1 = tr[:, 1]; X2 = tr[:, 2]
    @assert !any(isnan, X1)
    @assert !any(isnan, X2)
    te_x1x2[i] = association(est, X1, X2)
    te_x2x1[i] = association(est, X2, X1)
    s1 = surrogenerator(X1, RandomShuffle()); s2 = surrogenerator(X2, RandomShuffle())

    for j = 1:nsurr
        te_x1x2_surr[i, j] = association(est, s1(), X2)
        te_x2x1_surr[i, j] = association(est, s2(), X1)
    end
end

# Compute 95th percentiles of the surrogates for each ε
qs_x1x2 = [quantile(te_x1x2_surr[i, :], 0.95) for i = 1:length(εs)]
qs_x2x1 = [quantile(te_x2x1_surr[i, :], 0.95) for i = 1:length(εs)]

fig = with_theme(theme_minimal(), markersize = 2) do
    fig = Figure()
    ax = Axis(fig[1, 1], xlabel = "epsilon", ylabel = "Transfer entropy (bits)")
    scatterlines!(ax, εs, te_x1x2, label = "X1 to X2", color = :black, linewidth = 1.5)
    scatterlines!(ax, εs, qs_x1x2, color = :black, linestyle = :dot, linewidth = 1.5)
    scatterlines!(ax, εs, te_x2x1, label = "X2 to X1", color = :red)
    scatterlines!(ax, εs, qs_x2x1, color = :red, linestyle = :dot)
    axislegend(ax, position = :lt)
    return fig
end
fig

The plot above shows the original transfer entropies (solid lines) and the 95th percentile transfer entropies of the surrogate ensembles (dotted lines). As expected, using the surrogate test, the transfer entropies from X1 to X2 are mostly significant (solid black line is above dashed black line). The transfer entropies from X2 to X1, on the other hand, are mostly not significant (red solid line is below red dotted line).

`TERenyiJizba`

`EntropyDecomposition` + `TransferOperator`

We can perform the same type of analysis as above using TERenyiJizba instead of TEShannon.

using Associations
using DynamicalSystemsBase
using StableRNGs; rng = StableRNG(123)

# An example system where `X → Y → Z → W`.
sys = system(Logistic4Chain(; rng))
x, y, z, w = columns(first(trajectory(sys, 300, Ttr = 10000)))

precise = true # precise bin edges
discretization = CodifyVariables(TransferOperator(RectangularBinning(2, precise))) #
est_disc_to = EntropyDecomposition(TERenyiJizba(), PlugIn(Renyi()), discretization);
association(est_disc_to, x, y), association(est_disc_to, y, x)

(0.25515563518659623, 0.12262554759426325)

`ConvergentCrossMapping`

`RandomVectors` estimator

When cross-mapping with the RandomVectors estimator, a single random subsample of time indices (i.e. not in any particular order) of length l is drawn for each library size l, and cross mapping is performed using the embedding vectors corresponding to those time indices.

using Associations
using Random; rng = MersenneTwister(1234)
x, y = randn(rng, 200), randn(rng, 200)

# We'll draw a single sample at each `l ∈ libsizes`. Sampling with replacement is then
# necessary, because our 200-pt timeseries will result in embeddings with
# less than 200 points.
est = RandomVectors(ConvergentCrossMapping(d = 3); libsizes = 50:25:200, replace = true, rng)
crossmap(est, x, y)

7-element Vector{Float64}:
 0.403543378354588
 0.5856987205283815
 0.44287920268716086
 0.4966767048563532
 0.4355290972758533
 0.6832809291387226
 0.7311864792647486

To generate a distribution of cross-map estimates for each l ∈ libsizes, just call crossmap repeatedly, e.g.

using Associations
using Random; rng = MersenneTwister(1234)
using Statistics

x, y = randn(rng, 300), randn(rng, 300)
def = ConvergentCrossMapping(d = 3)
libsizes = 25:25:200

ρs = [[crossmap(RandomVectors(def; libsizes = L, replace = true, rng), x, y) for i = 1:50] for L in libsizes]

using CairoMakie
f = Figure(); ax = Axis(f[1, 1]);
plot!(ax, libsizes, mean.(ρs))
errorbars!(ax, libsizes, mean.(ρs), std.(ρs))
f

Now, the k-th element of ρs contains 80 estimates of the correspondence measure ρ at library size libsizes[k].

`RandomSegment` estimator

When cross-mapping with the RandomSegment estimator, a single random subsample of continguous, ordered time indices of length l is drawn for each library size l, and cross mapping is performed using the embedding vectors corresponding to those time indices.

using Associations
using Random; rng = MersenneTwister(1234)
x, y = randn(rng, 200), randn(rng, 200)

# We'll draw a single sample at each `l ∈ libsizes`. We limit the library size to 100,
# because drawing segments of the data longer than half the available data doesn't make
# much sense.
est = RandomSegment(ConvergentCrossMapping(d = 3); libsizes = 50:25:100, rng)
crossmap(est, x, y)

3-element Vector{Float64}:
 0.16203695962111725
 0.30717005220560617
 0.27363894462301874

As above, to generate a distribution of cross-map estimates for each l ∈ libsizes, just call crossmap repeatedly, e.g.

using Associations
using Random; rng = MersenneTwister(1234)
using Statistics

x, y = randn(rng, 200), randn(rng, 200)
def = ConvergentCrossMapping(d = 3)
libsizes = 25:25:100

ρs = [[crossmap(RandomSegment(def; libsizes = L, rng), x, y) for i = 1:50] for L in libsizes]

f = Figure(); ax = Axis(f[1, 1]);
plot!(ax, libsizes, mean.(ρs))
errorbars!(ax, libsizes, mean.(ρs), std.(ρs))
f

`PairwiseAsymmetricInference`

We repeat the analyses above, but here use the pairwise asymmetric inference algorithm instead of the convergent cross map algorithm.

`RandomVectors` estimator

using Associations
using Random; rng = MersenneTwister(1234)
x, y = randn(rng, 300), randn(rng, 300)

# We'll draw a single sample at each `l ∈ libsizes`. Sampling with replacement is then
# necessary, because our 200-pt timeseries will result in embeddings with
# less than 200 points.
est = RandomVectors(PairwiseAsymmetricInference(d = 3); libsizes = 50:25:200, replace = true, rng)
crossmap(est, x, y)

7-element Vector{Float64}:
 0.02148895635319534
 0.4131471276386324
 0.16500925285819215
 0.3427771476921493
 0.43406208466336454
 0.536744307953195
 0.7005991361206654

To generate a distribution of cross-map estimates for each l ∈ libsizes, just call crossmap repeatedly, e.g.

using Associations
using Random; rng = MersenneTwister(1234)
using Statistics

x, y = randn(rng, 300), randn(rng,300)
def = PairwiseAsymmetricInference(d = 3)
libsizes = 25:25:200

ρs = [[crossmap(RandomVectors(def; libsizes = L, replace = true, rng), x, y) for i = 1:50] for L in libsizes]

using CairoMakie
f = Figure(); ax = Axis(f[1, 1]);
plot!(ax, libsizes, mean.(ρs))
errorbars!(ax, libsizes, mean.(ρs), std.(ρs))
f

`RandomSegment` estimator

using Associations
using Random; rng = MersenneTwister(1234)
x, y = randn(rng, 200), randn(rng, 200)

# We'll draw a single sample at each `l ∈ libsizes`. We limit the library size to 100,
# because drawing segments of the data longer than half the available data doesn't make
# much sense.
est = RandomSegment(PairwiseAsymmetricInference(d = 3); libsizes = 50:25:100, rng)
crossmap(est, x, y)

3-element Vector{Float64}:
 -0.42394655036151907
  0.04597997698646141
 -0.13236022998996033

As above, to generate a distribution of cross-map estimates for each l ∈ libsizes, just call crossmap repeatedly, e.g.

using Associations
using Random; rng = MersenneTwister(1234)
using Statistics

x, y = randn(rng, 300), randn(rng, 300)
def = PairwiseAsymmetricInference(d = 3)
libsizes = 25:25:100

ρs = [[crossmap(RandomSegment(def; libsizes = L, rng), x, y) for i = 1:50] for L in libsizes]

using CairoMakie
f = Figure(); ax = Axis(f[1, 1]);
plot!(ax, libsizes, mean.(ρs))
errorbars!(ax, libsizes, mean.(ρs), std.(ρs))
f

`MCR`

To quantify association by the mean conditional probability of recurrence (MCR), we'll create a chain of variables where X drives Y, which in turn drives Z. We then expect there to be significant detectable association between both X and Y, Y and Z and also X and Z (because Y transfers information from X to Z. We expect the association between X and Z to disappear when conditioning on Y (since we're then "removing the effect" of Y).

using Associations
using Random; rng = Xoshiro(1234);
x = rand(rng, 300); y = rand(rng, 300) .* sin.(x); z = rand(rng, 300) .* y;
est = MCR(r = 0.5)
association(est, x, y), association(est, x, z), association(est, y, z), association(est, x, z, y)

(0.7711920426340347, 0.7420689143732229, 0.924270403653336, -0.00798860118526612)

The interpretation of the MCR measure is that if two variables are symmetrically coupled, then the conditional recurrence in both directions is equal. Two variables that are uncoupled are symmetrically coupled (i.e. no coupling). We therefore expect the difference in conditional recurrence to be around zero.

using Associations
using Random; rng = Xoshiro(1234)
x = rand(rng, 300)
y = rand(rng, 300)
m = MCR(r = 0.5)
Δ = association(m, x, y) - association(m, y, x)

-0.008308662176546688

`RMCD`

To quantify association by the recurrence measure of conditional dependence (RMCD), we'll create a chain of variables where X drives Y, which in turn drives Z. We then expect there to be significant detectable association between both X and Y, Y and Z and also X and Z (because Y transfers information from X to Z. We expect the association between X and Z to disappear when conditioning on Y (since we're then "removing the effect" of Y).

using Associations
using Random; rng = Xoshiro(1234);
x = rand(rng, 300); y = rand(rng, 300) .* sin.(x); z = rand(rng, 300) .* y;
est = RMCD(r = 0.5)
association(est, x, y), association(est, x, z), association(est, x, z, y)

(0.022387457155671135, 0.022129475954601702, 0.0)

`ChatterjeeCorrelation`

using Associations
using Random; rng = Xoshiro(1234);
x = rand(rng, 120)
y = rand(rng, 120) .* sin.(x)

120-element Vector{Float64}:
 0.2755719299262952
 0.2070408113032807
 0.6193365054669813
 0.008237347924455006
 0.04148542594436429
 0.313817853101071
 0.6257653008229676
 0.43134904735215984
 0.6829986456858073
 0.3095824205696399
 ⋮
 0.02169777371442121
 0.7764979588125409
 0.1660944201176525
 0.10967358670001909
 0.03230334731057383
 0.026337556651936055
 0.11124764808654403
 0.2599193572475318
 0.4628271687638507

By construction, there will almust surely be no ties in x, so we can use the fast estimate by setting handle_ties == false.

association(ChatterjeeCorrelation(handle_ties = false), x, y)

0.2726578234599625

If we have data where we know there are ties in the data, then we should set handle_ties == true.

w = rand(rng, 1:10, 120) # there will be some ties
z = rand(rng, 1:15, 120) .* sin.(w) # introduce some dependence
association(ChatterjeeCorrelation(handle_ties = true), w, z)

0.618104077588904

`AzadkiaChatterjeeCoefficient`

using Associations
using Random; rng = Xoshiro(1234);
x = rand(rng, 120)
y = rand(rng, 120) .* x
z = rand(rng, 120) .+ y

120-element Vector{Float64}:
 0.44737499407485565
 0.5516487384954861
 0.7581708420215838
 0.23074778575472404
 0.2500819453735219
 0.7353227843349248
 1.1500496733900634
 0.7621477984126711
 0.851799327542775
 0.5247237075962297
 ⋮
 0.3943206855229302
 1.838156393683287
 0.9709273173395769
 0.23333040123923995
 0.26781788473231355
 0.40232936334694847
 0.26839516685160714
 1.0709803535460094
 0.8668408207007512

For the variables above, where x → y → z, we expect stronger assocation between x and y than between x and z. We also expect the strength of the association between x and z to drop when conditioning on y, because y is the variable that connects x and z.

m = AzadkiaChatterjeeCoefficient(theiler = 0) # only exclude self-neighbors
association(m, x, y), association(m, x, z), association(m, x, z, y)

(0.3062018195708035, 0.06951871657754011, 0.010810810810810811)

JointProbabilities + OrdinalPatterns

JointProbabilities + OrdinalPatterns

JointProbabilities + OrdinalPatterns

JointProbabilities + OrdinalPatterns

JointProbabilities with Dispersion

JointProbabilities with OrdinalPatterns

JointProbabilities with OrdinalPatterns

JointProbabilities + CodifyVariables + UniqueElements

JointProbabilities + CodifyPoints + UniqueElementsEncoding

JointProbabilities + CodifyVariables + UniqueElements

JointProbabilities + CodifyPoints + UniqueElementsEncoding

JointProbabilities + CodifyVariables + UniqueElements

JointProbabilities + CodifyPoints + UniqueElementsEncoding

JointProbabilities + ValueBinning

JointProbabilities + UniqueElements

Dedicated GaussianMI estimator

Dedicated KraskovStögbauerGrassberger2 estimator

Dedicated KraskovStögbauerGrassberger2 estimator

Dedicated GaoKannanOhViswanath estimator

EntropyDecomposition + Kraskov

EntropyDecomposition + BubbleSortSwaps

EntropyDecomposition + Jackknife + ValueBinning

Estimation using DifferentialInfoEstimators: a comparison

JointProbabilities + UniqueElements

EntropyDecomposition + LeonenkoProzantoSavani

EntropyDecomposition + LeonenkoProzantoSavani

JointProbabilities + UniqueElements

JointProbabilities + CosineSimilarityBinning

JointProbabilities + UniqueElements

EntropyDecomposition + LeonenkoProzantoSavani

EntropyDecomposition + Dispersion

JointProbabilities + UniqueElements

EntropyDecomposition + LeonenkoProzantoSavani

EntropyDecomposition + OrdinalPatterns

CMIShannon with GaussianCMI

CMIShannon with FPVP

CMIShannon with MesnerShalizi

CMIShannon with Rahimzamani

MIDecomposition + KraskovStögbauerGrassberger2

JointProbabilities with CodifyVariables and ValueBinning

EntropyDecomposition + Kraskov

EntropyDecomposition + ValueBinning

JointProbabilities + BubbleSortSwaps

EntropyDecomposition + LeonenkoProzantoSavani

EntropyDecomposition + OrdinalPatterns

EntropyDecomposition + TransferOperator

SymbolicTransferEntropy estimator

EntropyDecomposition + TransferOperator

RandomVectors estimator

RandomSegment estimator

RandomVectors estimator

RandomSegment estimator

`JointProbabilities` + `OrdinalPatterns`

`JointProbabilities` + `OrdinalPatterns`

`JointProbabilities` + `OrdinalPatterns`

`JointProbabilities` + `OrdinalPatterns`

`JointProbabilities` with `Dispersion`

`JointProbabilities` with `OrdinalPatterns`

`JointProbabilities` with `OrdinalPatterns`

`JointProbabilities` + `CodifyVariables` + `UniqueElements`

`JointProbabilities` + `CodifyPoints` + `UniqueElementsEncoding`

`JointProbabilities` + `CodifyVariables` + `UniqueElements`

`JointProbabilities` + `CodifyPoints` + `UniqueElementsEncoding`

`JointProbabilities` + `CodifyVariables` + `UniqueElements`

`JointProbabilities` + `CodifyPoints` + `UniqueElementsEncoding`

`JointProbabilities` + `ValueBinning`

`JointProbabilities` + `UniqueElements`

Dedicated `GaussianMI` estimator

Dedicated `KraskovStögbauerGrassberger2` estimator

Dedicated `KraskovStögbauerGrassberger2` estimator

Dedicated `GaoKannanOhViswanath` estimator

`EntropyDecomposition` + `Kraskov`

`EntropyDecomposition` + `BubbleSortSwaps`

`EntropyDecomposition` + `Jackknife` + `ValueBinning`

Estimation using `DifferentialInfoEstimator`s: a comparison

`JointProbabilities` + `UniqueElements`

`EntropyDecomposition` + `LeonenkoProzantoSavani`

`EntropyDecomposition` + `LeonenkoProzantoSavani`

`JointProbabilities` + `UniqueElements`

`JointProbabilities` + `CosineSimilarityBinning`

`JointProbabilities` + `UniqueElements`

`EntropyDecomposition` + `LeonenkoProzantoSavani`

`EntropyDecomposition` + `Dispersion`

`JointProbabilities` + `UniqueElements`

`EntropyDecomposition` + `LeonenkoProzantoSavani`

`EntropyDecomposition` + `OrdinalPatterns`

`CMIShannon` with `GaussianCMI`

`CMIShannon` with `FPVP`

`CMIShannon` with `MesnerShalizi`

`CMIShannon` with `Rahimzamani`

`MIDecomposition` + `KraskovStögbauerGrassberger2`

`JointProbabilities` with `CodifyVariables` and `ValueBinning`

`EntropyDecomposition` + `Kraskov`

`EntropyDecomposition` + `ValueBinning`

`JointProbabilities` + `BubbleSortSwaps`

`EntropyDecomposition` + `LeonenkoProzantoSavani`

`EntropyDecomposition` + `OrdinalPatterns`

`EntropyDecomposition` + `TransferOperator`

`SymbolicTransferEntropy` estimator

`EntropyDecomposition` + `TransferOperator`

`RandomVectors` estimator

`RandomSegment` estimator

`RandomVectors` estimator

`RandomSegment` estimator