Multivariate information API
ComplexityMeasures.information — Methodinformation(est::MultivariateInformationMeasureEstimator, x...)Estimate some MultivariateInformationMeasure on input data x..., using the given MultivariateInformationMeasureEstimator.
This is just a convenience wrapper around association(est, x...).
Associations.MultivariateInformationMeasureEstimator — TypeMultivariateInformationMeasureEstimatorThe supertype for all estimators of multivariate information measures.
Generic implementations
Dedicated implementations
- KraskovStögbauerGrassberger2
- KraskovStögbauerGrassberger2
- GaoOhViswanath
- GaoKannanOhViswanath
- GaussianMI
Estimators
Generic estimators
We provide a set of generic estimators that can be used to calculate potentially several types of information measures.
Associations.JointProbabilities — TypeJointProbabilities <: InformationMeasureEstimator
JointProbabilities(
    definition::MultivariateInformationMeasure,
    discretization::Discretization
)JointProbabilities is a generic estimator for multivariate discrete information measures.
Usage
- Use with associationto compute an information measure from input data.
Description
It first encodes the input data according to the given discretization, then constructs  probs, a multidimensional Probabilities instance. Finally, probs are  forwarded to a PlugIn estimator, which computes the measure according to  definition.
Compatible encoding schemes
- CodifyVariables(encode each variable/column of the input data independently by applying an encoding in a sliding window over each input variable).
- CodifyPoints(encode each point/column of the input data)
Works for any OutcomeSpace that implements codify.
Using JointProbabilities to compute an information measure,  e.g. conditional mutual estimation, is typically slower than other dedicated estimation procedures like EntropyDecomposition. The reason is that measures such as CMIShannon can be formulated as a sum of four entropies, which can be estimated individually and summed afterwards.  This decomposition is fast because because we avoid explicitly estimating the entire joint pmf,  which demands many extra calculation steps, However, the decomposition is biased,  because it fails to fully take into consideration the joint relationships between the variables. Pick your estimator according to your needs.
See also: Counts, Probabilities, ProbabilitiesEstimator, OutcomeSpace, DiscreteInfoEstimator.
Associations.EntropyDecomposition — TypeEntropyDecomposition(definition::MultivariateInformationMeasure, 
    est::DifferentialInfoEstimator)
EntropyDecomposition(definition::MultivariateInformationMeasure,
    est::DiscreteInfoEstimator,
    discretization::CodifyVariables{<:OutcomeSpace},
    pest::ProbabilitiesEstimator = RelativeAmount())Estimate the multivariate information measure specified by definition by rewriting its formula into some combination of entropy terms. 
Estimation
EntropyDecomposition allows using any  InformationMeasureEstimatorss from  ComplexityMeasures.jl to estimate multivariate information measures.
Computations are done by computing individual entropy terms using est, then combining them according to  definition to form the final estimate. 
Bias
Estimating the definition by decomposition into a combination of entropy terms, which are estimated independently, will in general be more biased than when using a dedicated estimator. One reason is that this decomposition may miss out on crucial information in the joint space. To remedy this, dedicated information measure  estimators typically derive the marginal estimates by first considering the joint space, and then does some clever trick to eliminate the bias that is introduced through a naive decomposition. Unless specified below, no bias correction is  applied for EntropyDecomposition.
Usage
- Use with associationto compute aMultivariateInformationMeasurefrom input data:association(est::EntropyDecomposition, x...).
- Use with some IndependenceTestto test for independence between variables.
Description
If est is a DifferentialInfoEstimator, then discretization and pest  are ignored. 
Differential estimation
If using the first signature, any compatible DifferentialInfoEstimator can be  used. MITsallisMartin can be estimated using a decomposition into entropy  terms using EntropyDecomposition. This is done by using estimators from  ComplexityMeasures.jl. We can use any compatible  InformationMeasureEstimator that can estimate differential Tsallis entropy from ComplexityMeasures.jl.
Discrete estimation
If est is a DiscreteInfoEstimator, then  discretization and a probabilities estimator pest must also be provided (default to RelativeAmount,  which uses naive plug-in probabilities). This will always discretize the input data  per variable/column, and the encode the discretized column into integers using codify.
The following OutcomeSpaces can be used for discretisation.  Note that not all outcome spaces will work with all measures.
| Estimator | Principle | 
|---|---|
| UniqueElements | Count of unique elements | 
| ValueBinning | Binning (histogram) | 
| OrdinalPatterns | Ordinal patterns | 
| Dispersion | Dispersion patterns | 
| BubbleSortSwaps | Sorting complexity | 
| CosineSimilarityBinning | Cosine similarities histogram | 
Handling of overlapping parameters
If there are overlapping parameters between the measure to be estimated, and the lower-level decomposed measures, then the top-level measure parameter takes precedence. For example, if we want to estimate CMIShannon(base = 2) through a decomposition  of entropies using the Kraskov(Shannon(base = ℯ)) Shannon entropy estimator, then base = 2 is used.
Not all measures have the property that they can be decomposed into more fundamental information theoretic quantities. For example, MITsallisMartin can be  decomposed into a combination of marginal entropies, while MIRenyiSarbu cannot. An error will be thrown if decomposition is not possible.
Discrete entropy decomposition
The second signature is for discrete estimation using DiscreteInfoEstimators, for example PlugIn. The given discretization scheme (typically an  OutcomeSpace) controls how the joint/marginals are discretized, and the probabilities estimator pest controls how probabilities are estimated from counts.
Like for DifferentialInfoEstimator, using a dedicated estimator  for the measure in question will be more reliable than using a decomposition estimate. Here's how different discretizations are applied:
- ValueBinning. Bin visitation frequencies are counted in the joint space- XY, then marginal visitations are obtained from the joint bin visits. This behaviour is the same for both- FixedRectangularBinningand- RectangularBinning(which adapts the grid to the data). When using- FixedRectangularBinning, the range along the first dimension is used as a template for all other dimensions. This is a bit slower than naively binning each marginal, but lessens bias.
- OrdinalPatterns. Each timeseries is separately- codify-ed according to its ordinal pattern (no bias correction).
- Dispersion. Each timeseries is separately- codify-ed according to its dispersion pattern (no bias correction).
Examples
- Example 1:   MIShannonestimation using decomposition into discreteShannonentropy estimated usingCodifyVariableswithValueBinning.
- Example 2:   MIShannonestimation using decomposition into discreteShannonentropy estimated usingCodifyVariableswithBubbleSortSwaps.
- Example 3:   MIShannonestimation using decomposition into differentalShannonentropy estimated using theKraskovestimator.
See also: MutualInformationEstimator, MultivariateInformationMeasure.
Associations.MIDecomposition — TypeMIDecomposition(definition::MultivariateInformationMeasure, 
    est::MutualInformationEstimator)Estimate the MultivariateInformationMeasure specified by definition by by decomposing, the measure, if possible, into a combination of mutual information terms. These terms are individually estimated using the given MutualInformationEstimator est, and finally combined to form the final  value of the measure. 
Usage
- Use with associationto compute aMultivariateInformationMeasurefrom input data:association(est::MIDecomposition, x...).
- Use with some IndependenceTestto test for independence between variables.
Examples
- Example 1: Estimating CMIShannonusing a decomposition intoMIShannonterms using theKraskovStögbauerGrassberger2mutual information estimator.
See also: MultivariateInformationMeasureEstimator.
Associations.CMIDecomposition — TypeCMIDecomposition(definition::MultivariateInformationMeasure, 
    est::ConditionalMutualInformationEstimator)Estimate some multivariate information measure specified by definition, by decomposing it into a combination of conditional mutual information terms. 
Usage
- Use with associationto compute aMultivariateInformationMeasurefrom input data:association(est::CMIDecomposition, x...).
- Use with some IndependenceTestto test for independence between variables.
Description
Each of the conditional mutual information terms are estimated using est, which  can be any ConditionalMutualInformationEstimator. Finally, these estimates  are combined according to the relevant decomposition formula.
This estimator is similar to EntropyDecomposition, but definition is expressed as  conditional mutual information terms instead of entropy terms.
Examples
- Example 1: Estimating TEShannonby decomposing it intoCMIShannonwhich is estimated using theFPVPestimator.
See also: ConditionalMutualInformationEstimator,  MultivariateInformationMeasureEstimator, MultivariateInformationMeasure.
Mutual information estimators
Associations.MutualInformationEstimator — TypeMutualInformationEstimatorThe supertype for dedicated MutualInformation estimators.
Concrete implementations
Associations.KraskovStögbauerGrassberger1 — TypeKSG1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1(; k::Int = 1, w = 0, metric_marginals = Chebyshev())The KraskovStögbauerGrassberger1 mutual information estimator (you can use KSG1 for short) is the $I^{(1)}$ k-th nearest neighbor estimator from Kraskov et al. (2004).
Compatible definitions
Usage
- Use with associationto compute Shannon mutual information from input data.
- Use with some IndependenceTestto test for independence between variables.
Keyword arguments
- k::Int: The number of nearest neighbors to consider. Only information about the- k-th nearest neighbor is actually used.
- metric_marginals: The distance metric for the marginals for the marginals can be any metric from- Distances.jl. It defaults to- metric_marginals = Chebyshev(), which is the same as in Kraskov et al. (2004).
- w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to- 0, meaning that only the point itself is excluded.
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG1(; k = 10), x, y) # should be near 0 (and can be negative)Associations.KraskovStögbauerGrassberger2 — TypeKSG2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2(; k::Int = 1, w = 0, metric_marginals = Chebyshev())The KraskovStögbauerGrassberger2 mutual information estimator (you can use KSG2 for short) is the $I^{(2)}$ k-th nearest neighbor estimator from (Kraskov et al., 2004).
Compatible definitions
Usage
- Use with associationto compute Shannon mutual information from input data.
- Use with some IndependenceTestto test for independence between variables.
Keyword arguments
- k::Int: The number of nearest neighbors to consider. Only information about the- k-th nearest neighbor is actually used.
- metric_marginals: The distance metric for the marginals for the marginals can be any metric from- Distances.jl. It defaults to- metric_marginals = Chebyshev(), which is the same as in Kraskov et al. (2004).
- w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to- 0, meaning that only the point itself is excluded.
Description
Let the joint StateSpaceSet $X := \{\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m \}$ be defined by the concatenation of the marginal StateSpaceSets $\{ \bf{X}_k \}_{k=1}^m$, where each $\bf{X}_k$ is potentially multivariate. Let $\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N$ be the points in the joint space $X$.
The KraskovStögbauerGrassberger2 estimator first locates, for each $\bf{x}_i \in X$, the point $\bf{n}_i \in X$, the k-th nearest neighbor to $\bf{x}_i$, according to the maximum norm (Chebyshev metric). Let $\epsilon_i$ be the distance $d(\bf{x}_i, \bf{n}_i)$.
Consider $x_i^m \in \bf{X}_m$, the $i$-th point in the marginal space $\bf{X}_m$. For each $\bf{x}_i^m$, we determine $\theta_i^m$ := the number of points $\bf{x}_k^m \in \bf{X}_m$ that are a distance less than $\epsilon_i$ away from $\bf{x}_i^m$. That is, we use the distance from a query point $\bf{x}_i \in X$ (in the joint space) to count neighbors of $x_i^m \in \bf{X}_m$ (in the marginal space).
Shannon mutual information between the variables $\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m$ is then estimated as
\[\hat{I}_{KSG2}(\bf{X}) = \psi{(k)} - \dfrac{m - 1}{k} + (m - 1)\psi{(N)} - \dfrac{1}{N} \sum_{i = 1}^N \sum_{j = 1}^m \psi{(\theta_i^j + 1)}\]
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG2(; k = 10), x, y) # should be near 0 (and can be negative)Associations.GaoKannanOhViswanath — TypeGaoKannanOhViswanath <: MutualInformationEstimator
GaoKannanOhViswanath(definition = MIShannon(); k = 1, w = 0)The GaoKannanOhViswanath (Shannon) estimator is designed for estimating Shannon mutual information between variables that may be either discrete, continuous or a mixture of both (Gao et al., 2017).
Compatible definitions
Keyword arguments
- k::Int: The number of nearest neighbors to consider. Only information about the- k-th nearest neighbor is actually used.
- w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to- 0, meaning that only the point itself is excluded.
Usage
- Use with associationto compute Shannon mutual information from input data.
- Use with some IndependenceTestto test for independence between variables.
Description
The estimator starts by expressing mutual information in terms of the Radon-Nikodym derivative, and then estimates these derivatives using k-nearest neighbor distances from empirical samples.
The estimator avoids the common issue of having to add noise to data before analysis due to tied points, which may bias other estimators. Citing their paper, the estimator "strongly outperforms natural baselines of discretizing the mixed random variables (by quantization) or making it continuous by adding a small Gaussian noise."
In Gao et al. (2017), they claim (roughly speaking) that the estimator reduces to the KraskovStögbauerGrassberger2 estimator for continuous-valued data. However, KraskovStögbauerGrassberger2 uses the digamma function, while GaoKannanOhViswanath uses the logarithm instead, so the estimators are not exactly equivalent for continuous data.
Moreover, in their algorithm 1, it is clearly not the case that the method falls back on the KraskovStögbauerGrassberger1 approach. The KraskovStögbauerGrassberger1 estimator uses k-th neighbor distances in the joint space, while the GaoKannanOhViswanath algorithm selects the maximum k-th nearest distances among the two marginal spaces, which are in general not the same as the k-th neighbor distance in the joint space (unless both marginals are univariate). Therefore, our implementation here differs slightly from algorithm 1 in GaoKannanOhViswanath. We have modified it in a way that mimics KraskovStögbauerGrassberger2 for continous data. Note that because of using the log function instead of digamma, there will be slight differences between the methods. See the source code for more details.
Even though the GaoKannanOhViswanath estimator is designed to handle discrete data, our implementation demands that all input data are StateSpaceSets whose data points are floats. If you have discrete data, such as strings or symbols, encode them using integers and convert those integers to floats before passing them to association.
Examples
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoKannanOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)Associations.GaoOhViswanath — TypeGaoOhViswanath <: MutualInformationEstimator
GaoOhViswanath(definition = MIShannon(); k = 1, w = 0)The GaoOhViswanath is a mutual information estimator based on nearest neighbors, and is also called the bias-improved-KSG estimator, or BI-KSG, by (Gao et al., 2018).
Compatible definitions
Keyword arguments
- k::Int: The number of nearest neighbors to consider. Only information about the- k-th nearest neighbor is actually used.
- w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to- 0, meaning that only the point itself is excluded.
Usage
- Use with associationto compute Shannon mutual information from input data.
- Use with some IndependenceTestto test for independence between variables.
Description
The estimator is given by
\[\begin{align*} \hat{H}_{GAO}(X, Y) &= \hat{H}_{KSG}(X) + \hat{H}_{KSG}(Y) - \hat{H}_{KZL}(X, Y) \\ &= \psi{(k)} + \log{(N)} + \log{ \left( \dfrac{c_{d_{x}, 2} c_{d_{y}, 2}}{c_{d_{x} + d_{y}, 2}} \right) } - \\ & \dfrac{1}{N} \sum_{i=1}^N \left( \log{(n_{x, i, 2})} + \log{(n_{y, i, 2})} \right) \end{align*},\]
where $c_{d, 2} = \dfrac{\pi^{\frac{d}{2}}}{\Gamma{(\dfrac{d}{2} + 1)}}$ is the volume of a $d$-dimensional unit $\mathcal{l}_2$-ball.
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)Associations.GaussianMI — TypeGaussianMI <: MutualInformationEstimator
GaussianMI(; normalize::Bool = false)GaussianMI is a parametric estimator for Shannon mutual information.
Compatible definitions
Usage
- Use with associationto compute Shannon mutual information from input data.
- Use with some IndependenceTestto test for independence between variables.
Description
Given $d_x$-dimensional and $d_y$-dimensional input data X and Y, GaussianMI first constructs the $d_x + d_y$-dimensional joint StateSpaceSet XY. If normalize == true, then we follow the approach in Vejmelka & Palus (2008)(Vejmelka and Paluš, 2008) and transform each column in XY to have zero mean and unit standard deviation. If normalize == false, then the algorithm proceeds without normalization.
Next, the C_{XY}, the correlation matrix for the (normalized) joint data XY is computed. The mutual information estimate GaussianMI assumes the input variables are distributed according to normal distributions with zero means and unit standard deviations. Therefore, given $d_x$-dimensional and $d_y$-dimensional input data X and Y, GaussianMI first constructs the joint StateSpaceSet XY, then transforms each column in XY to have zero mean and unit standard deviation, and finally computes the \Sigma, the correlation matrix for XY.
The mutual information estimated (for normalize == false) is then estimated as
\[\hat{I}^S_{Gaussian}(X; Y) = \dfrac{1}{2} \dfrac{ \det(\Sigma_X) \det(\Sigma_Y)) }{\det(\Sigma))},\]
where we $\Sigma_X$ and $\Sigma_Y$ appear in $\Sigma$ as
\[\Sigma = \begin{bmatrix} \Sigma_{X} & \Sigma^{'}\\ \Sigma^{'} & \Sigma_{Y} \end{bmatrix}.\]
If normalize == true, then the mutual information is estimated as
\[\hat{I}^S_{Gaussian}(X; Y) = -\dfrac{1}{2} \sum_{i = 1}^{d_x + d_y} \sigma_i,\]
where $\sigma_i$ are the eigenvalues for $\Sigma$.
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaussianMI(), x, y) # should be near 0 (and can be negative)Conditional mutual information estimators
Associations.ConditionalMutualInformationEstimator — TypeConditionalMutualInformationEstimatorThe supertype for dedicated ConditionalMutualInformation estimators.
Concrete implementations
Associations.GaussianCMI — TypeGaussianCMI <: MutualInformationEstimator
GaussianCMI(definition = CMIShannon(); normalize::Bool = false)GaussianCMI is a parametric ConditionalMutualInformationEstimator  (Vejmelka and Paluš, 2008).
Compatible definitions
Usage
- Use with associationto computeCMIShannonfrom input data.
- Use with some IndependenceTestto test for independence between variables.
Description
GaussianCMI estimates Shannon CMI through a sum of two mutual information terms that each are estimated using GaussianMI (the normalize keyword is the same as for GaussianMI):
\[\hat{I}_{Gaussian}(X; Y | Z) = \hat{I}_{Gaussian}(X; Y, Z) - \hat{I}_{Gaussian}(X; Z)\]
Examples
- Example 1. Estimating CMIShannon.
Associations.FPVP — TypeFPVP <: ConditionalMutualInformationEstimator
FPVP(definition = CMIShannon(); k = 1, w = 0)The Frenzel-Pompe-Vejmelka-Paluš (or FPVP for short) ConditionalMutualInformationEstimator is used to estimate the conditional mutual information using a k-th nearest neighbor approach that is analogous to that of the KraskovStögbauerGrassberger2 mutual information estimator from Frenzel and Pompe (2007) and Vejmelka and Paluš (2008).
k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Compatible definitions
Usage
- Use with associationto computeConditionalMutualInformationmeasure from input data.
- Use with some IndependenceTestto test for independence between variables.
Examples
- Example 1: Estimating CMIShannon
Associations.MesnerShalizi — TypeMesnerShalizi <: ConditionalMutualInformationEstimator
MesnerShalizi(definition = CMIShannon(); k = 1, w = 0)The MesnerShalizi ConditionalMutualInformationEstimator is designed for data that can be mixtures of discrete and continuous data (Mesner and Shalizi, 2020).
k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Compatible definitions
Usage
- Use with associationto computeCMIShannonfrom input data.
- Use with some IndependenceTestto test for independence between variables.
Examples
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(MesnerShalizi(; k = 10), x, z, y) # should be near 0 (and can be negative)Associations.Rahimzamani — TypeRahimzamani <: ConditionalMutualInformationEstimator
Rahimzamani(k = 1, w = 0)The Rahimzamani ConditionalMutualInformationEstimator is designed for data that can be mixtures of discrete and continuous data (Rahimzamani et al., 2018).
Compatible definitions
Usage
- Use with associationto compute aCMIShannonfrom input data.
- Use with some IndependenceTestto test for independence between variables.
Description
This estimator is very similar to the GaoKannanOhViswanath mutual information estimator, but has been expanded to the conditional mutual information case.
k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Examples
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(Rahimzamani(; k = 10), x, z, y) # should be near 0 (and can be negative)Associations.PoczosSchneiderCMI — TypePoczosSchneiderCMI <: ConditionalMutualInformationEstimator
PoczosSchneiderCMI(definition = CMIRenyiPoczos(); k = 1, w = 0)The PoczosSchneiderCMI ConditionalMutualInformationEstimator  computes conditional mutual informations using a k-th nearest neighbor approach (Póczos and Schneider, 2012).
k is the number of nearest neighbors. w is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Compatible definitions
Usage
- Use with associationto computeCMIRenyiPoczosfrom input data.
- Use with some IndependenceTestto test for independence between variables.
Examples
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(PoczosSchneiderCMI(CMIRenyiPoczos(), k = 10), x, z, y) # should be near 0 (and can be negative)Transfer entropy estimators
Associations.TransferEntropyEstimator — TypeThe supertype of all dedicated transfer entropy estimators.
Associations.Zhu1 — TypeZhu1 <: TransferEntropyEstimator
Zhu1(k = 1, w = 0, base = MathConstants.e)The Zhu1 transfer entropy estimator (Zhu et al., 2015) for normalized input data  (as described in Zhu et al. (2015)) for both for pairwise and conditional transfer entropy.
Compatible definitions
Usage
- Use with associationto computeTEShannonfrom input data.
- Use with some IndependenceTestto test for independence between variables.
Description
This estimator approximates probabilities within hyperrectangles surrounding each point xᵢ ∈ x using using k nearest neighbor searches. However, it also considers the number of neighbors falling on the borders of these hyperrectangles. This estimator is an extension to the entropy estimator in Singh et al. (2003).
w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).
Description
For a given points in the joint embedding space jᵢ, this estimator first computes the distance dᵢ from jᵢ to its k-th nearest neighbor. Then, for each point mₖ[i] in the k-th marginal space, it counts the number of points within radius dᵢ.
The Shannon transfer entropy is then computed as
\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]
where the index k references the three marginal subspaces T, TTf and ST for which neighbor searches are performed. Here this estimator has been modified to allow for  conditioning too (a simple modification to Lindner et al. (2011)'s equation 5 and 6). 
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Zhu1(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)Associations.Lindner — TypeLindner <: TransferEntropyEstimator
Lindner(definition = Shannon(); k = 1, w = 0, base = 2)The Lindner transfer entropy estimator (Lindner et al., 2011), which is also used in the Trentool MATLAB toolbox, and is based on nearest neighbor searches.
Compatible definitions
Usage
- Use with associationto computeTEShannonfrom input data.
- Use with some IndependenceTestto test for independence between variables.
Keyword parameters
w is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0, meaning that only the point itself is excluded when searching for neighbours).
The estimator can be used both for pairwise and conditional transfer entropy estimation.
Description
For a given points in the joint embedding space jᵢ, this estimator first computes the distance dᵢ from jᵢ to its k-th nearest neighbor. Then, for each point mₖ[i] in the k-th marginal space, it counts the number of points within radius dᵢ.
The Shannon transfer entropy is then computed as
\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]
where the index k references the three marginal subspaces T, TTf and ST for which neighbor searches are performed. Here this estimator has been modified to allow for  conditioning too (a simple modification to Lindner et al. (2011)'s equation 5 and 6). 
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Lindner(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)Associations.SymbolicTransferEntropy — TypeSymbolicTransferEntropy <: TransferEntropyEstimator
SymbolicTransferEntropy(definition = TEShannon(); m = 3, τ = 1, 
    lt = ComplexityMeasures.isless_randA convenience estimator for symbolic transfer entropy (Staniek and Lehnertz, 2008).
Compatible measures
Description
Symbolic transfer entropy consists of two simple steps. First, the input time series are encoded using codify with the CodifyVariables discretization and the OrdinalPatterns outcome space. This  transforms the input time series into integer time series. Transfer entropy entropy is then  estimated from the encoded time series by applying  
Transfer entropy is then estimated as usual on the encoded timeseries with the embedding dictated by definition and the JointProbabilities estimator.
Examples
Associations.Hilbert — TypeHilbert(est;
    source::InstantaneousSignalProperty = Phase(),
    target::InstantaneousSignalProperty = Phase(),
    cond::InstantaneousSignalProperty = Phase())
) <: TransferDifferentialEntropyEstimatorCompute transfer entropy on instantaneous phases/amplitudes of relevant signals, which are obtained by first applying the Hilbert transform to each signal, then extracting the phases/amplitudes of the resulting complex numbers (Paluš, 2014). Original time series are thus transformed to instantaneous phase/amplitude time series. Transfer entropy is then estimated using the provided est on those phases/amplitudes (use e.g. ValueBinning, or OrdinalPatterns).
Details on estimation of the transfer entropy (conditional mutual information) following the phase/amplitude extraction step is not given in Palus (2014). Here, after instantaneous phases/amplitudes have been obtained, these are treated as regular time series, from which transfer entropy is then computed as usual.
Associations.Phase — TypePhase <: InstantaneousSignalPropertyIndicates that the instantaneous phases of a signal should be used.
Associations.Amplitude — TypeAmplitude <: InstantaneousSignalPropertyIndicates that the instantaneous amplitudes of a signal should be used.
A small tutorial
Associations.jl extends the single-variate information API in ComplexityMeasures.jl to information measures of multiple variables.
Definitions
We define "information measure" as some functional of probability mass functions or probability densities. This definition may or may not agree with literature usage, depending on the context. We made this choice pragmatically based on user-friendlyness and coding-friendlyness, but still trying to maintain some level of meaningful terminology.
Upon doing a literature review on the possible variants of information theoretic measures, it become painstakingly obvious that authors use the same name for different concepts. For novices, and experienced practitioners too, this can be confusing. Our API clearly distinguishes between methods that are conceptually the same but named differently in the literature due to differing estimation strategies, from methods that actually have different definitions.
- Multiple, equivalent definitions occur for example for the Shannon mutual   information (MI; MIShannon), which has both a discrete and continuous version, and there there are multiple equivalent mathematical formulas for them: a direct sum/integral over a joint probability mass function (pmf), as a sum of three entropy terms, and as a Kullback-Leibler divergence between the joint pmf and the product of the marginal distributions. Since these definitions are all equivalent, we only need once type (MIShannon) to represent them.
- But Shannon MI is not the  only type of mutual information! For example, "Tsallis mutual information"   has been proposed in different variants by various authors. Despite sharing the   same name, these are actually nonequivalent definitions. We've thus assigned   them entirely different measure names (e.g. MITsallisFuruichiandMITsallisMartin), with the author name at the end.
Basic estimation strategy
To estimate a multivariate information measure in practice, you must first specify the definition of the measure, which is then used as input to an  estimator of that measure. This estimator is then given to association. Every information measure has at least one estimator: JointProbabilities. Many measures have several additional estimators, and an overview can be found in the docstring for MultivariateInformationMeasureEstimator.
Distances/divergences
There are many information measures in the literature that aim to quantify the distance/divergence between two probability mass functions (pmf) or densities. You can find those that we implement here.
As an example, let's quantify the KLDivergence between two probability  mass functions estimated by symbolizing two input vectors x and y using  OrdinalPatterns. Since the discrete KLDivergence can be  expressed as a function of a joint pmf, we can use the JointProbabilities estimator.
using Associations
using Random; rng = MersenneTwister(1234)
x, y = rand(rng, 1000), rand(rng, 1000)
# Specify a discretization. We discretize per column.
disc = CodifyVariables(OrdinalPatterns(m=2))
def = KLDivergence()
est = JointProbabilities(def, disc)
association(est, x, y) # should be close to 04.625912964358308e-5Divergences are examples of asymmetric information measures, which we can see by flipping the order of the input data.
association(est, y, x)4.625888243106873e-5Conditional entropies
Conditional entropies are another example of asymmetric information measures. They all have in common that  they are functions of a joint pmf, and can therefore also be estimated using the JointProbabilities estimator. This time, we'll use a rectangular binning with 3 bins along each dimension to discretize the data.
using Associations
using Random; rng = Xoshiro(1234)
x, y = randn(rng, 1000), randn(rng, 1000)
disc = CodifyVariables(ValueBinning(3))
def = ConditionalEntropyShannon(base = 2)
est = JointProbabilities(def, disc)
association(est, x, y)1.17479123360062Joint entropies
Joint entropies, on the other hand, are symmetric. Joint entropies are functionals of a joint pmf, so we can still use the JointProbabilities estimator. This time, we use a Dispersion based discretization.
using Associations
using Random; rng = Xoshiro(1234)
x, y = randn(rng, 1000), randn(rng, 1000)
disc = CodifyVariables(Dispersion())
est = JointProbabilities(JointEntropyShannon(base = 2), disc)
association(est, x, y) ≈ association(est, y, x) # should be truetrueMutual informations
Mutual informations, in particular MIShannon is an often-used symmetric  measure for quantifing the (possibly nonlinear) association between variables. It appears in both  discrete and differential form, and can be estimated in a multitude of ways. For  example, one can use dedicated MutualInformationEstimators such as  KraskovStögbauerGrassberger2 or GaussianMI:
using Associations
using StateSpaceSets
# We'll construct two state space sets, to illustrate how we can discretize
# multidimensional inputs using `CodifyPoints`.
x, y = StateSpaceSet(rand(rng, 1000, 2)), StateSpaceSet(rand(rng, 1000, 2))
est = KSG1(MIShannon(base = 2), k = 10)
association(est, x, y)-0.021186834467335727The result should be symmetric:
association(est, x, y) ≈ association(est, y, x) # should be truetrueOne can also estimate mutual information using the EntropyDecomposition  estimator, or (like above) using the JointProbabilities estimator. Let's construct a differential entropy based estimator based on the Kraskov estimator.
est_diff = EntropyDecomposition(MIShannon(base = 2), Kraskov(Shannon(), k=10))EntropyDecomposition{MIShannon{Int64}, Kraskov{Shannon{Int64}, Int64}, Nothing, Nothing}(MIShannon{Int64}(2), Kraskov(definition = Shannon(base = 2), k = 10, w = 0, base = 2), nothing, nothing)association(est_diff, x, y)-0.3826780635557282We can also construct a discrete entropy based estimator based on e.g. PlugIn estimator of Shannon entropy.
# We know that `x` and `y` were generated from a uniform distribution above,
# so we set the minimum and maximum values of the encoding to 0 and 1,
# respectively.
encoding = RelativeMeanEncoding(0.0, 1.0; n = 4)
disc = CodifyPoints(encoding)
est_disc = JointProbabilities(MIShannon(base = 2), disc)
association(est_disc, x, y)0.00494036679990649JointProbabilities: fine grained discretization control
For numerical data, we can estimate both counts and probabilities using CodifyVariables with any count-based OutcomeSpace. Here, we'll estimate MIShannon using  one type of encoding for the first variable, and another type of encoding for the second variable.
using Associations
using Random; rng = Xoshiro(1234)
x, y = rand(rng, 100), rand(rng, 100)
# We must use outcome spaces with the same number of total outcomes.
# This works becuase these outcome spaces construct embedding points
# ("windows") in the same way/order; be careful if that isn't the case!
ox = CosineSimilarityBinning(nbins = factorial(3))
oy = OrdinalPatterns(m = 3)
# Now estimate mutual information
discretization = CodifyVariables((ox, oy))
est = JointProbabilities(MIShannon(), discretization)
association(est, x, y)0.10752906020545049For more fine-grained control than CodifyVariables can offer, we can use CodifyPoints with one or several Encodings. Here's how we can estimate MIShannon one multivariate input  data by discretizing each input variable in arbitrary ways.
using Associations
using Random; rng = Xoshiro(1234)
x, y = StateSpaceSet(rand(rng, 1000, 2)), StateSpaceSet(rand(rng, 1000, 3))
 # min/max of the `rand` call is 0 and 1
precise = true # precise bin edges
r = range(0, 1; length = 3)
binning = FixedRectangularBinning(r, dimension(x), precise)
encoding_x = RectangularBinEncoding(binning, x)
encoding_y = CombinationEncoding(RelativeMeanEncoding(0.0, 1, n = 2), OrdinalPatternEncoding(3))
discretization = CodifyPoints(encoding_x, encoding_y)
# Now estimate mutual information
est = JointProbabilities(MIShannon(), discretization)
association(est, x, y)0.025991777755365167