Multivariate information API

ComplexityMeasures.information — Method

information(est::MultivariateInformationMeasureEstimator, x...)

Estimate some MultivariateInformationMeasure on input data x..., using the given MultivariateInformationMeasureEstimator.

This is just a convenience wrapper around association(est, x...).

source

Associations.MultivariateInformationMeasureEstimator — Type

MultivariateInformationMeasureEstimator

The supertype for all estimators of multivariate information measures.

Generic implementations

JointProbabilities
EntropyDecomposition
MIDecomposition
CMIDecomposition

Dedicated implementations

MutualInformationEstimators:

KraskovStögbauerGrassberger1
KraskovStögbauerGrassberger2
GaoOhViswanath
GaoKannanOhViswanath
GaussianMI

ConditionalMutualInformationEstimators:

FPVP
MesnerShalizi
Rahimzamani
PoczosSchneiderCMI
GaussianCMI

TransferEntropyEstimators:

Zhu1
Lindner

source

Estimators

Generic estimators

We provide a set of generic estimators that can be used to calculate potentially several types of information measures.

Associations.JointProbabilities — Type

JointProbabilities <: InformationMeasureEstimator
JointProbabilities(
    definition::MultivariateInformationMeasure,
    discretization::Discretization
)

JointProbabilities is a generic estimator for multivariate discrete information measures.

Usage

Use with association to compute an information measure from input data.

Description

It first encodes the input data according to the given discretization, then constructs probs, a multidimensional Probabilities instance. Finally, probs are forwarded to a PlugIn estimator, which computes the measure according to definition.

Compatible encoding schemes

CodifyVariables (encode each variable/column of the input data independently by applying an encoding in a sliding window over each input variable).
CodifyPoints (encode each point/column of the input data)

Works for any OutcomeSpace that implements codify.

Joint probabilities vs decomposition methods

Using JointProbabilities to compute an information measure, e.g. conditional mutual estimation, is typically slower than other dedicated estimation procedures like EntropyDecomposition. The reason is that measures such as CMIShannon can be formulated as a sum of four entropies, which can be estimated individually and summed afterwards. This decomposition is fast because because we avoid explicitly estimating the entire joint pmf, which demands many extra calculation steps, However, the decomposition is biased, because it fails to fully take into consideration the joint relationships between the variables. Pick your estimator according to your needs.

source

Associations.EntropyDecomposition — Type

EntropyDecomposition(definition::MultivariateInformationMeasure, 
    est::DifferentialInfoEstimator)
EntropyDecomposition(definition::MultivariateInformationMeasure,
    est::DiscreteInfoEstimator,
    discretization::CodifyVariables{<:OutcomeSpace},
    pest::ProbabilitiesEstimator = RelativeAmount())

Estimate the multivariate information measure specified by definition by rewriting its formula into some combination of entropy terms.

If calling the second method (discrete variant), then discretization is always done per variable/column and each column is encoded into integers using codify.

Usage

Use with association to compute a MultivariateInformationMeasure from input data: association(est::EntropyDecomposition, x...).
Use with some IndependenceTest to test for independence between variables.

Description

The entropy terms are estimated using est, and then combined to form the final estimate of definition. No bias correction is applied. If est is a DifferentialInfoEstimator, then discretization and pest are ignored. If est is a DiscreteInfoEstimator, then discretization and a probabilities estimator pest must also be provided (default to RelativeAmount, which uses naive plug-in probabilities).

Compatible differential information estimators

If using the first signature, any compatible DifferentialInfoEstimator can be used.

Compatible outcome spaces for discrete estimation

If using the second signature, the outcome spaces can be used for discretisation. Note that not all outcome spaces will work with all measures.

Estimator	Principle
`UniqueElements`	Count of unique elements
`ValueBinning`	Binning (histogram)
`OrdinalPatterns`	Ordinal patterns
`Dispersion`	Dispersion patterns
`BubbleSortSwaps`	Sorting complexity
`CosineSimilarityBinning`	Cosine similarities histogram

Bias

Estimating the definition by decomposition into a combination of entropy terms, which are estimated independently, will in general be more biased than when using a dedicated estimator. One reason is that this decomposition may miss out on crucial information in the joint space. To remedy this, dedicated information measure estimators typically derive the marginal estimates by first considering the joint space, and then does some clever trick to eliminate the bias that is introduced through a naive decomposition. Unless specified below, no bias correction is applied for EntropyDecomposition.

Handling of overlapping parameters

If there are overlapping parameters between the measure to be estimated, and the lower-level decomposed measures, then the top-level measure parameter takes precedence. For example, if we want to estimate CMIShannon(base = 2) through a decomposition of entropies using the Kraskov(Shannon(base = ℯ)) Shannon entropy estimator, then base = 2 is used.

Info

Not all measures have the property that they can be decomposed into more fundamental information theoretic quantities. For example, MITsallisMartin can be decomposed into a combination of marginal entropies, while MIRenyiSarbu cannot. An error will be thrown if decomposition is not possible.

Discrete entropy decomposition

The second signature is for discrete estimation using DiscreteInfoEstimators, for example PlugIn. The given discretization scheme (typically an OutcomeSpace) controls how the joint/marginals are discretized, and the probabilities estimator pest controls how probabilities are estimated from counts.

Bias

Like for DifferentialInfoEstimator, using a dedicated estimator for the measure in question will be more reliable than using a decomposition estimate. Here's how different discretizations are applied:

ValueBinning. Bin visitation frequencies are counted in the joint space XY, then marginal visitations are obtained from the joint bin visits. This behaviour is the same for both FixedRectangularBinning and RectangularBinning (which adapts the grid to the data). When using FixedRectangularBinning, the range along the first dimension is used as a template for all other dimensions. This is a bit slower than naively binning each marginal, but lessens bias.
OrdinalPatterns. Each timeseries is separately codify-ed according to its ordinal pattern (no bias correction).
Dispersion. Each timeseries is separately codify-ed according to its dispersion pattern (no bias correction).

Examples

Example 1: MIShannon estimation using decomposition into discrete Shannon entropy estimated using CodifyVariables with ValueBinning.
Example 2: MIShannon estimation using decomposition into discrete Shannon entropy estimated using CodifyVariables with BubbleSortSwaps.
Example 3: MIShannon estimation using decomposition into differental Shannon entropy estimated using the Kraskov estimator.

source

Associations.MIDecomposition — Type

MIDecomposition(definition::MultivariateInformationMeasure, 
    est::MutualInformationEstimator)

Estimate the MultivariateInformationMeasure specified by definition by by decomposing, the measure, if possible, into a combination of mutual information terms. These terms are individually estimated using the given MutualInformationEstimator est, and finally combined to form the final value of the measure.

Usage

Use with association to compute a MultivariateInformationMeasure from input data: association(est::MIDecomposition, x...).
Use with some IndependenceTest to test for independence between variables.

Examples

Example 1: Estimating CMIShannon using a decomposition into MIShannon terms using the KraskovStögbauerGrassberger1 mutual information estimator.

source

Associations.CMIDecomposition — Type

CMIDecomposition(definition::MultivariateInformationMeasure, 
    est::ConditionalMutualInformationEstimator)

Estimate some multivariate information measure specified by definition, by decomposing it into a combination of conditional mutual information terms.

Usage

Use with association to compute a MultivariateInformationMeasure from input data: association(est::CMIDecomposition, x...).
Use with some IndependenceTest to test for independence between variables.

Description

Each of the conditional mutual information terms are estimated using est, which can be any ConditionalMutualInformationEstimator. Finally, these estimates are combined according to the relevant decomposition formula.

This estimator is similar to EntropyDecomposition, but definition is expressed as conditional mutual information terms instead of entropy terms.

Examples

Example 1: Estimating TEShannon by decomposing it into CMIShannon which is estimated using the FPVP estimator.

source

Mutual information estimators

Associations.MutualInformationEstimator — Type

MutualInformationEstimator

The supertype for dedicated MutualInformation estimators.

Concrete implementations

KraskovStögbauerGrassberger1
KraskovStögbauerGrassberger2
GaoOhViswanath
GaoKannanOhViswanath
GaussianMI

source

Associations.KraskovStögbauerGrassberger1 — Type

KSG1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1(; k::Int = 1, w = 0, metric_marginals = Chebyshev())

The KraskovStögbauerGrassberger1 mutual information estimator (you can use KSG1 for short) is the $I^{(1)}$ k-th nearest neighbor estimator from Kraskov et al. (2004).

Compatible definitions

MIShannon

Usage

Use with association to compute Shannon mutual information from input data.
Use with some IndependenceTest to test for independence between variables.

Keyword arguments

k::Int: The number of nearest neighbors to consider. Only information about the k-th nearest neighbor is actually used.
metric_marginals: The distance metric for the marginals for the marginals can be any metric from Distances.jl. It defaults to metric_marginals = Chebyshev(), which is the same as in Kraskov et al. (2004).
w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to 0, meaning that only the point itself is excluded.

Example

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG1(; k = 10), x, y) # should be near 0 (and can be negative)

source

Associations.KraskovStögbauerGrassberger2 — Type

KSG2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2(; k::Int = 1, w = 0, metric_marginals = Chebyshev())

The KraskovStögbauerGrassberger2 mutual information estimator (you can use KSG2 for short) is the $I^{(2)}$ k-th nearest neighbor estimator from (Kraskov et al., 2004).

Compatible definitions

MIShannon

Usage

Use with association to compute Shannon mutual information from input data.
Use with some IndependenceTest to test for independence between variables.

Keyword arguments

k::Int: The number of nearest neighbors to consider. Only information about the k-th nearest neighbor is actually used.
metric_marginals: The distance metric for the marginals for the marginals can be any metric from Distances.jl. It defaults to metric_marginals = Chebyshev(), which is the same as in Kraskov et al. (2004).
w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to 0, meaning that only the point itself is excluded.

Description

Let the joint StateSpaceSet $X := \{\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m \}$ be defined by the concatenation of the marginal StateSpaceSets $\{ \bf{X}_k \}_{k=1}^m$, where each $\bf{X}_k$ is potentially multivariate. Let $\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N$ be the points in the joint space $X$.

The KraskovStögbauerGrassberger2 estimator first locates, for each $\bf{x}_i \in X$, the point $\bf{n}_i \in X$, the k-th nearest neighbor to $\bf{x}_i$, according to the maximum norm (Chebyshev metric). Let $\epsilon_i$ be the distance $d(\bf{x}_i, \bf{n}_i)$.

Consider $x_i^m \in \bf{X}_m$, the $i$-th point in the marginal space $\bf{X}_m$. For each $\bf{x}_i^m$, we determine $\theta_i^m$ := the number of points $\bf{x}_k^m \in \bf{X}_m$ that are a distance less than $\epsilon_i$ away from $\bf{x}_i^m$. That is, we use the distance from a query point $\bf{x}_i \in X$ (in the joint space) to count neighbors of $x_i^m \in \bf{X}_m$ (in the marginal space).

Shannon mutual information between the variables $\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m$ is then estimated as

\[\hat{I}_{KSG2}(\bf{X}) = \psi{(k)} - \dfrac{m - 1}{k} + (m - 1)\psi{(N)} - \dfrac{1}{N} \sum_{i = 1}^N \sum_{j = 1}^m \psi{(\theta_i^j + 1)}\]

Example

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG2(; k = 10), x, y) # should be near 0 (and can be negative)

source

Associations.GaoKannanOhViswanath — Type

GaoKannanOhViswanath <: MutualInformationEstimator
GaoKannanOhViswanath(definition = MIShannon(); k = 1, w = 0)

The GaoKannanOhViswanath (Shannon) estimator is designed for estimating Shannon mutual information between variables that may be either discrete, continuous or a mixture of both (Gao et al., 2017).

Compatible definitions

MIShannon

Keyword arguments

k::Int: The number of nearest neighbors to consider. Only information about the k-th nearest neighbor is actually used.
w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to 0, meaning that only the point itself is excluded.

Usage

Use with association to compute Shannon mutual information from input data.
Use with some IndependenceTest to test for independence between variables.

Description

The estimator starts by expressing mutual information in terms of the Radon-Nikodym derivative, and then estimates these derivatives using k-nearest neighbor distances from empirical samples.

The estimator avoids the common issue of having to add noise to data before analysis due to tied points, which may bias other estimators. Citing their paper, the estimator "strongly outperforms natural baselines of discretizing the mixed random variables (by quantization) or making it continuous by adding a small Gaussian noise."

Implementation note

In Gao et al. (2017), they claim (roughly speaking) that the estimator reduces to the KraskovStögbauerGrassberger1 estimator for continuous-valued data. However, KraskovStögbauerGrassberger1 uses the digamma function, while GaoKannanOhViswanath uses the logarithm instead, so the estimators are not exactly equivalent for continuous data.

Moreover, in their algorithm 1, it is clearly not the case that the method falls back on the KraskovStögbauerGrassberger1 approach. The KraskovStögbauerGrassberger1 estimator uses k-th neighbor distances in the joint space, while the GaoKannanOhViswanath algorithm selects the maximum k-th nearest distances among the two marginal spaces, which are in general not the same as the k-th neighbor distance in the joint space (unless both marginals are univariate). Therefore, our implementation here differs slightly from algorithm 1 in GaoKannanOhViswanath. We have modified it in a way that mimics KraskovStögbauerGrassberger1 for continous data. Note that because of using the log function instead of digamma, there will be slight differences between the methods. See the source code for more details.

Explicitly convert your discrete data to floats

Even though the GaoKannanOhViswanath estimator is designed to handle discrete data, our implementation demands that all input data are StateSpaceSets whose data points are floats. If you have discrete data, such as strings or symbols, encode them using integers and convert those integers to floats before passing them to association.

Examples

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoKannanOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)

source

Associations.GaoOhViswanath — Type

GaoOhViswanath <: MutualInformationEstimator
GaoOhViswanath(definition = MIShannon(); k = 1, w = 0)

The GaoOhViswanath is a mutual information estimator based on nearest neighbors, and is also called the bias-improved-KSG estimator, or BI-KSG, by (Gao et al., 2018).

Compatible definitions

MIShannon

Keyword arguments

k::Int: The number of nearest neighbors to consider. Only information about the k-th nearest neighbor is actually used.
w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to 0, meaning that only the point itself is excluded.

Usage

Use with association to compute Shannon mutual information from input data.
Use with some IndependenceTest to test for independence between variables.

Description

The estimator is given by

\[\begin{align*} \hat{H}_{GAO}(X, Y) &= \hat{H}_{KSG}(X) + \hat{H}_{KSG}(Y) - \hat{H}_{KZL}(X, Y) \\ &= \psi{(k)} + \log{(N)} + \log{ \left( \dfrac{c_{d_{x}, 2} c_{d_{y}, 2}}{c_{d_{x} + d_{y}, 2}} \right) } - \\ & \dfrac{1}{N} \sum_{i=1}^N \left( \log{(n_{x, i, 2})} + \log{(n_{y, i, 2})} \right) \end{align*},\]

where $c_{d, 2} = \dfrac{\pi^{\frac{d}{2}}}{\Gamma{(\dfrac{d}{2} + 1)}}$ is the volume of a $d$-dimensional unit $\mathcal{l}_2$-ball.

Example

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)

source

Associations.GaussianMI — Type

GaussianMI <: MutualInformationEstimator
GaussianMI(; normalize::Bool = false)

GaussianMI is a parametric estimator for Shannon mutual information.

Compatible definitions

MIShannon

Usage

Use with association to compute Shannon mutual information from input data.
Use with some IndependenceTest to test for independence between variables.

Description

Given $d_x$-dimensional and $d_y$-dimensional input data X and Y, GaussianMI first constructs the $d_x + d_y$-dimensional joint StateSpaceSet XY. If normalize == true, then we follow the approach in Vejmelka & Palus (2008)(Vejmelka and Paluš, 2008) and transform each column in XY to have zero mean and unit standard deviation. If normalize == false, then the algorithm proceeds without normalization.

Next, the C_{XY}, the correlation matrix for the (normalized) joint data XY is computed. The mutual information estimate GaussianMI assumes the input variables are distributed according to normal distributions with zero means and unit standard deviations. Therefore, given $d_x$-dimensional and $d_y$-dimensional input data X and Y, GaussianMI first constructs the joint StateSpaceSet XY, then transforms each column in XY to have zero mean and unit standard deviation, and finally computes the \Sigma, the correlation matrix for XY.

The mutual information estimated (for normalize == false) is then estimated as

\[\hat{I}^S_{Gaussian}(X; Y) = \dfrac{1}{2} \dfrac{ \det(\Sigma_X) \det(\Sigma_Y)) }{\det(\Sigma))},\]

where we $\Sigma_X$ and $\Sigma_Y$ appear in $\Sigma$ as

\[\Sigma = \begin{bmatrix} \Sigma_{X} & \Sigma^{'}\\ \Sigma^{'} & \Sigma_{Y} \end{bmatrix}.\]

If normalize == true, then the mutual information is estimated as

\[\hat{I}^S_{Gaussian}(X; Y) = -\dfrac{1}{2} \sum_{i = 1}^{d_x + d_y} \sigma_i,\]

where $\sigma_i$ are the eigenvalues for $\Sigma$.

Example

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaussianMI(), x, y) # should be near 0 (and can be negative)

source

Conditional mutual information estimators

Associations.ConditionalMutualInformationEstimator — Type

ConditionalMutualInformationEstimator

The supertype for dedicated ConditionalMutualInformation estimators.

Concrete implementations

FPVP
GaussianCMI
MesnerShalizi
Rahimzamani
PoczosSchneiderCMI

source

Associations.GaussianCMI — Type

GaussianCMI <: MutualInformationEstimator
GaussianCMI(definition = CMIShannon(); normalize::Bool = false)

GaussianCMI is a parametric ConditionalMutualInformationEstimator (Vejmelka and Paluš, 2008).

Compatible definitions

CMIShannon

Usage

Use with association to compute CMIShannon from input data.
Use with some IndependenceTest to test for independence between variables.

Description

GaussianCMI estimates Shannon CMI through a sum of two mutual information terms that each are estimated using GaussianMI (the normalize keyword is the same as for GaussianMI):

\[\hat{I}_{Gaussian}(X; Y | Z) = \hat{I}_{Gaussian}(X; Y, Z) - \hat{I}_{Gaussian}(X; Z)\]

Examples

Example 1. Estimating CMIShannon.

source

Associations.FPVP — Type

FPVP <: ConditionalMutualInformationEstimator
FPVP(definition = CMIShannon(); k = 1, w = 0)

The Frenzel-Pompe-Vejmelka-Paluš (or FPVP for short) ConditionalMutualInformationEstimator is used to estimate the conditional mutual information using a k-th nearest neighbor approach that is analogous to that of the KraskovStögbauerGrassberger1 mutual information estimator from Frenzel and Pompe (2007) and Vejmelka and Paluš (2008).

k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.

Compatible definitions

CMIShannon

Usage

Use with association to compute ConditionalMutualInformation measure from input data.
Use with some IndependenceTest to test for independence between variables.

Examples

Example 1: Estimating CMIShannon

source

Associations.MesnerShalizi — Type

MesnerShalizi <: ConditionalMutualInformationEstimator
MesnerShalizi(definition = CMIShannon(); k = 1, w = 0)

The MesnerShalizi ConditionalMutualInformationEstimator is designed for data that can be mixtures of discrete and continuous data (Mesner and Shalizi, 2020).

k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.

Compatible definitions

CMIShannon

Usage

Use with association to compute CMIShannon from input data.
Use with some IndependenceTest to test for independence between variables.

Examples

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(MesnerShalizi(; k = 10), x, z, y) # should be near 0 (and can be negative)

source

Associations.Rahimzamani — Type

Rahimzamani <: ConditionalMutualInformationEstimator
Rahimzamani(k = 1, w = 0)

The Rahimzamani ConditionalMutualInformationEstimator is designed for data that can be mixtures of discrete and continuous data (Rahimzamani et al., 2018).

Compatible definitions

CMIShannon

Usage

Use with association to compute a CMIShannon from input data.
Use with some IndependenceTest to test for independence between variables.

Description

This estimator is very similar to the GaoKannanOhViswanath mutual information estimator, but has been expanded to the conditional mutual information case.

k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.

Examples

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(Rahimzamani(; k = 10), x, z, y) # should be near 0 (and can be negative)

source

Associations.PoczosSchneiderCMI — Type

PoczosSchneiderCMI <: ConditionalMutualInformationEstimator
PoczosSchneiderCMI(definition = CMIRenyiPoczos(); k = 1, w = 0)

The PoczosSchneiderCMI ConditionalMutualInformationEstimator computes conditional mutual informations using a k-th nearest neighbor approach (Póczos and Schneider, 2012).

k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.

Compatible definitions

CMIRenyiPoczos

Usage

Use with association to compute CMIRenyiPoczos from input data.
Use with some IndependenceTest to test for independence between variables.

Examples

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(PoczosSchneiderCMI(CMIRenyiPoczos(), k = 10), x, z, y) # should be near 0 (and can be negative)

source

Transfer entropy estimators

Associations.TransferEntropyEstimator — Type

The supertype of all dedicated transfer entropy estimators.

source

Associations.Zhu1 — Type

Zhu1 <: TransferEntropyEstimator
Zhu1(k = 1, w = 0, base = MathConstants.e)

The Zhu1 transfer entropy estimator (Zhu et al., 2015) for normalized input data (as described in Zhu et al. (2015)) for both for pairwise and conditional transfer entropy.

Compatible definitions

TEShannon

Usage

Use with association to compute TEShannon from input data.
Use with some IndependenceTest to test for independence between variables.

Description

This estimator approximates probabilities within hyperrectangles surrounding each point xᵢ ∈ x using using k nearest neighbor searches. However, it also considers the number of neighbors falling on the borders of these hyperrectangles. This estimator is an extension to the entropy estimator in Singh et al. (2003).

w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).

Description

For a given points in the joint embedding space jᵢ, this estimator first computes the distance dᵢ from jᵢ to its k-th nearest neighbor. Then, for each point mₖ[i] in the k-th marginal space, it counts the number of points within radius dᵢ.

The Shannon transfer entropy is then computed as

\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]

where the index k references the three marginal subspaces T, TTf and ST for which neighbor searches are performed. Here this estimator has been modified to allow for conditioning too (a simple modification to Lindner et al. (2011)'s equation 5 and 6).

Example

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Zhu1(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)

source

Associations.Lindner — Type

Lindner <: TransferEntropyEstimator
Lindner(definition = Shannon(); k = 1, w = 0, base = 2)

The Lindner transfer entropy estimator (Lindner et al., 2011), which is also used in the Trentool MATLAB toolbox, and is based on nearest neighbor searches.

Compatible definitions

TEShannon

Usage

Use with association to compute TEShannon from input data.
Use with some IndependenceTest to test for independence between variables.

Keyword parameters

The estimator can be used both for pairwise and conditional transfer entropy estimation.

Description

The Shannon transfer entropy is then computed as

\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]

Example

using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Lindner(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)

source

Associations.SymbolicTransferEntropy — Type

SymbolicTransferEntropy <: TransferEntropyEstimator
SymbolicTransferEntropy(definition = TEShannon(); m = 3, τ = 1, 
    lt = ComplexityMeasures.isless_rand

A convenience estimator for symbolic transfer entropy (Staniek and Lehnertz, 2008).

Compatible measures

TEShannon

Description

Symbolic transfer entropy consists of two simple steps. First, the input time series are encoded using codify with the CodifyVariables discretization and the OrdinalPatterns outcome space. This transforms the input time series into integer time series. Transfer entropy entropy is then estimated from the encoded time series by applying

Transfer entropy is then estimated as usual on the encoded timeseries with the embedding dictated by definition and the JointProbabilities estimator.

Examples

Example 1

source

Associations.Hilbert — Type

Hilbert(est;
    source::InstantaneousSignalProperty = Phase(),
    target::InstantaneousSignalProperty = Phase(),
    cond::InstantaneousSignalProperty = Phase())
) <: TransferDifferentialEntropyEstimator

Compute transfer entropy on instantaneous phases/amplitudes of relevant signals, which are obtained by first applying the Hilbert transform to each signal, then extracting the phases/amplitudes of the resulting complex numbers (Paluš, 2014). Original time series are thus transformed to instantaneous phase/amplitude time series. Transfer entropy is then estimated using the provided est on those phases/amplitudes (use e.g. ValueBinning, or OrdinalPatterns).

Info

Details on estimation of the transfer entropy (conditional mutual information) following the phase/amplitude extraction step is not given in Palus (2014). Here, after instantaneous phases/amplitudes have been obtained, these are treated as regular time series, from which transfer entropy is then computed as usual.

A small tutorial

Associations.jl extends the single-variate information API in ComplexityMeasures.jl to information measures of multiple variables.

Definitions

We define "information measure" as some functional of probability mass functions or probability densities. This definition may or may not agree with literature usage, depending on the context. We made this choice pragmatically based on user-friendlyness and coding-friendlyness, but still trying to maintain some level of meaningful terminology.

A note on naming: the same name for different things?

Upon doing a literature review on the possible variants of information theoretic measures, it become painstakingly obvious that authors use the same name for different concepts. For novices, and experienced practitioners too, this can be confusing. Our API clearly distinguishes between methods that are conceptually the same but named differently in the literature due to differing estimation strategies, from methods that actually have different definitions.

Multiple, equivalent definitions occur for example for the Shannon mutual information (MI; MIShannon), which has both a discrete and continuous version, and there there are multiple equivalent mathematical formulas for them: a direct sum/integral over a joint probability mass function (pmf), as a sum of three entropy terms, and as a Kullback-Leibler divergence between the joint pmf and the product of the marginal distributions. Since these definitions are all equivalent, we only need once type (MIShannon) to represent them.
But Shannon MI is not the only type of mutual information! For example, "Tsallis mutual information" has been proposed in different variants by various authors. Despite sharing the same name, these are actually nonequivalent definitions. We've thus assigned them entirely different measure names (e.g. MITsallisFuruichi and MITsallisMartin), with the author name at the end.

Basic estimation strategy

To estimate a multivariate information measure in practice, you must first specify the definition of the measure, which is then used as input to an estimator of that measure. This estimator is then given to association. Every information measure has at least one estimator: JointProbabilities. Many measures have several additional estimators, and an overview can be found in the docstring for MultivariateInformationMeasureEstimator.

Distances/divergences

There are many information measures in the literature that aim to quantify the distance/divergence between two probability mass functions (pmf) or densities. You can find those that we implement here.

As an example, let's quantify the KLDivergence between two probability mass functions estimated by symbolizing two input vectors x and y using OrdinalPatterns. Since the discrete KLDivergence can be expressed as a function of a joint pmf, we can use the JointProbabilities estimator.

using Associations
using Random; rng = MersenneTwister(1234)
x, y = rand(rng, 1000), rand(rng, 1000)

# Specify a discretization. We discretize per column.
disc = CodifyVariables(OrdinalPatterns(m=2))

def = KLDivergence()
est = JointProbabilities(def, disc)
association(est, x, y) # should be close to 0

4.625912964358308e-5

Divergences are examples of asymmetric information measures, which we can see by flipping the order of the input data.

association(est, y, x)

4.625888243106873e-5

Conditional entropies

Conditional entropies are another example of asymmetric information measures. They all have in common that they are functions of a joint pmf, and can therefore also be estimated using the JointProbabilities estimator. This time, we'll use a rectangular binning with 3 bins along each dimension to discretize the data.

using Associations
using Random; rng = Xoshiro(1234)

x, y = randn(rng, 1000), randn(rng, 1000)
disc = CodifyVariables(ValueBinning(3))
def = ConditionalEntropyShannon(base = 2)
est = JointProbabilities(def, disc)
association(est, x, y)

1.17479123360062

Joint entropies

Joint entropies, on the other hand, are symmetric. Joint entropies are functionals of a joint pmf, so we can still use the JointProbabilities estimator. This time, we use a Dispersion based discretization.

using Associations
using Random; rng = Xoshiro(1234)

x, y = randn(rng, 1000), randn(rng, 1000)
disc = CodifyVariables(Dispersion())
est = JointProbabilities(JointEntropyShannon(base = 2), disc)
association(est, x, y) ≈ association(est, y, x) # should be true

true

Mutual informations

Mutual informations, in particular MIShannon is an often-used symmetric measure for quantifing the (possibly nonlinear) association between variables. It appears in both discrete and differential form, and can be estimated in a multitude of ways. For example, one can use dedicated MutualInformationEstimators such as KraskovStögbauerGrassberger2 or GaussianMI:

using Associations
using StateSpaceSets
# We'll construct two state space sets, to illustrate how we can discretize
# multidimensional inputs using `CodifyPoints`.
x, y = StateSpaceSet(rand(rng, 1000, 2)), StateSpaceSet(rand(rng, 1000, 2))
est = KSG1(MIShannon(base = 2), k = 10)
association(est, x, y)

-0.021186834467335727

The result should be symmetric:

association(est, x, y) ≈ association(est, y, x) # should be true

true

One can also estimate mutual information using the EntropyDecomposition estimator, or (like above) using the JointProbabilities estimator. Let's construct a differential entropy based estimator based on the Kraskov estimator.

est_diff = EntropyDecomposition(MIShannon(base = 2), Kraskov(Shannon(), k=10))

EntropyDecomposition{MIShannon{Int64}, Kraskov{Shannon{Int64}, Int64}, Nothing, Nothing}(MIShannon{Int64}(2), Kraskov(definition = Shannon(base = 2), k = 10, w = 0, base = 2), nothing, nothing)

association(est_diff, x, y)

-0.3826780635557282

We can also construct a discrete entropy based estimator based on e.g. PlugIn estimator of Shannon entropy.

# We know that `x` and `y` were generated from a uniform distribution above,
# so we set the minimum and maximum values of the encoding to 0 and 1,
# respectively.
encoding = RelativeMeanEncoding(0.0, 1.0; n = 4)
disc = CodifyPoints(encoding)
est_disc = JointProbabilities(MIShannon(base = 2), disc)
association(est_disc, x, y)

0.00494036679990649

JointProbabilities: fine grained discretization control

For numerical data, we can estimate both counts and probabilities using CodifyVariables with any count-based OutcomeSpace. Here, we'll estimate MIShannon using one type of encoding for the first variable, and another type of encoding for the second variable.

using Associations
using Random; rng = Xoshiro(1234)
x, y = rand(rng, 100), rand(rng, 100)

# We must use outcome spaces with the same number of total outcomes.
# This works becuase these outcome spaces construct embedding points
# ("windows") in the same way/order; be careful if that isn't the case!
ox = CosineSimilarityBinning(nbins = factorial(3))
oy = OrdinalPatterns(m = 3)

# Now estimate mutual information
discretization = CodifyVariables((ox, oy))
est = JointProbabilities(MIShannon(), discretization)
association(est, x, y)

0.10752906020545049

For more fine-grained control than CodifyVariables can offer, we can use CodifyPoints with one or several Encodings. Here's how we can estimate MIShannon one multivariate input data by discretizing each input variable in arbitrary ways.

using Associations
using Random; rng = Xoshiro(1234)
x, y = StateSpaceSet(rand(rng, 1000, 2)), StateSpaceSet(rand(rng, 1000, 3))

 # min/max of the `rand` call is 0 and 1
precise = true # precise bin edges
r = range(0, 1; length = 3)
binning = FixedRectangularBinning(r, dimension(x), precise)
encoding_x = RectangularBinEncoding(binning, x)
encoding_y = CombinationEncoding(RelativeMeanEncoding(0.0, 1, n = 2), OrdinalPatternEncoding(3))
discretization = CodifyPoints(encoding_x, encoding_y)

# Now estimate mutual information
est = JointProbabilities(MIShannon(), discretization)
association(est, x, y)

0.025991777755365167