Multivariate information API
ComplexityMeasures.information
— Methodinformation(est::MultivariateInformationMeasureEstimator, x...)
Estimate some MultivariateInformationMeasure
on input data x...
, using the given MultivariateInformationMeasureEstimator
.
This is just a convenience wrapper around association
(est, x...)
.
Associations.MultivariateInformationMeasureEstimator
— TypeMultivariateInformationMeasureEstimator
The supertype for all estimators of multivariate information measures.
Generic implementations
Dedicated implementations
KraskovStögbauerGrassberger1
KraskovStögbauerGrassberger2
GaoOhViswanath
GaoKannanOhViswanath
GaussianMI
Estimators
Generic estimators
We provide a set of generic estimators that can be used to calculate potentially several types of information measures.
Associations.JointProbabilities
— TypeJointProbabilities <: InformationMeasureEstimator
JointProbabilities(
definition::MultivariateInformationMeasure,
discretization::Discretization
)
JointProbabilities
is a generic estimator for multivariate discrete information measures.
Usage
- Use with
association
to compute an information measure from input data.
Description
It first encodes the input data according to the given discretization
, then constructs probs
, a multidimensional Probabilities
instance. Finally, probs
are forwarded to a PlugIn
estimator, which computes the measure according to definition
.
Compatible encoding schemes
CodifyVariables
(encode each variable/column of the input data independently by applying an encoding in a sliding window over each input variable).CodifyPoints
(encode each point/column of the input data)
Works for any OutcomeSpace
that implements codify
.
Using JointProbabilities
to compute an information measure, e.g. conditional mutual estimation, is typically slower than other dedicated estimation procedures like EntropyDecomposition
. The reason is that measures such as CMIShannon
can be formulated as a sum of four entropies, which can be estimated individually and summed afterwards. This decomposition is fast because because we avoid explicitly estimating the entire joint pmf, which demands many extra calculation steps, However, the decomposition is biased, because it fails to fully take into consideration the joint relationships between the variables. Pick your estimator according to your needs.
See also: Counts
, Probabilities
, ProbabilitiesEstimator
, OutcomeSpace
, DiscreteInfoEstimator
.
Associations.EntropyDecomposition
— TypeEntropyDecomposition(definition::MultivariateInformationMeasure,
est::DifferentialInfoEstimator)
EntropyDecomposition(definition::MultivariateInformationMeasure,
est::DiscreteInfoEstimator,
discretization::CodifyVariables{<:OutcomeSpace},
pest::ProbabilitiesEstimator = RelativeAmount())
Estimate the multivariate information measure specified by definition
by rewriting its formula into some combination of entropy terms.
If calling the second method (discrete variant), then discretization is always done per variable/column and each column is encoded into integers using codify
.
Usage
- Use with
association
to compute aMultivariateInformationMeasure
from input data:association
(est::EntropyDecomposition, x...)
. - Use with some
IndependenceTest
to test for independence between variables.
Description
The entropy terms are estimated using est
, and then combined to form the final estimate of definition
. No bias correction is applied. If est
is a DifferentialInfoEstimator
, then discretization
and pest
are ignored. If est
is a DiscreteInfoEstimator
, then discretization
and a probabilities estimator pest
must also be provided (default to RelativeAmount
, which uses naive plug-in probabilities).
Compatible differential information estimators
If using the first signature, any compatible DifferentialInfoEstimator
can be used.
Compatible outcome spaces for discrete estimation
If using the second signature, the outcome spaces can be used for discretisation. Note that not all outcome spaces will work with all measures.
Estimator | Principle |
---|---|
UniqueElements | Count of unique elements |
ValueBinning | Binning (histogram) |
OrdinalPatterns | Ordinal patterns |
Dispersion | Dispersion patterns |
BubbleSortSwaps | Sorting complexity |
CosineSimilarityBinning | Cosine similarities histogram |
Bias
Estimating the definition
by decomposition into a combination of entropy terms, which are estimated independently, will in general be more biased than when using a dedicated estimator. One reason is that this decomposition may miss out on crucial information in the joint space. To remedy this, dedicated information measure estimators typically derive the marginal estimates by first considering the joint space, and then does some clever trick to eliminate the bias that is introduced through a naive decomposition. Unless specified below, no bias correction is applied for EntropyDecomposition
.
Handling of overlapping parameters
If there are overlapping parameters between the measure to be estimated, and the lower-level decomposed measures, then the top-level measure parameter takes precedence. For example, if we want to estimate CMIShannon(base = 2)
through a decomposition of entropies using the Kraskov(Shannon(base = ℯ))
Shannon entropy estimator, then base = 2
is used.
Not all measures have the property that they can be decomposed into more fundamental information theoretic quantities. For example, MITsallisMartin
can be decomposed into a combination of marginal entropies, while MIRenyiSarbu
cannot. An error will be thrown if decomposition is not possible.
Discrete entropy decomposition
The second signature is for discrete estimation using DiscreteInfoEstimator
s, for example PlugIn
. The given discretization
scheme (typically an OutcomeSpace
) controls how the joint/marginals are discretized, and the probabilities estimator pest
controls how probabilities are estimated from counts.
Like for DifferentialInfoEstimator
, using a dedicated estimator for the measure in question will be more reliable than using a decomposition estimate. Here's how different discretization
s are applied:
ValueBinning
. Bin visitation frequencies are counted in the joint spaceXY
, then marginal visitations are obtained from the joint bin visits. This behaviour is the same for bothFixedRectangularBinning
andRectangularBinning
(which adapts the grid to the data). When usingFixedRectangularBinning
, the range along the first dimension is used as a template for all other dimensions. This is a bit slower than naively binning each marginal, but lessens bias.OrdinalPatterns
. Each timeseries is separatelycodify
-ed according to its ordinal pattern (no bias correction).Dispersion
. Each timeseries is separatelycodify
-ed according to its dispersion pattern (no bias correction).
Examples
- Example 1:
MIShannon
estimation using decomposition into discreteShannon
entropy estimated usingCodifyVariables
withValueBinning
. - Example 2:
MIShannon
estimation using decomposition into discreteShannon
entropy estimated usingCodifyVariables
withBubbleSortSwaps
. - Example 3:
MIShannon
estimation using decomposition into differentalShannon
entropy estimated using theKraskov
estimator.
See also: MutualInformationEstimator
, MultivariateInformationMeasure
.
Associations.MIDecomposition
— TypeMIDecomposition(definition::MultivariateInformationMeasure,
est::MutualInformationEstimator)
Estimate the MultivariateInformationMeasure
specified by definition
by by decomposing, the measure, if possible, into a combination of mutual information terms. These terms are individually estimated using the given MutualInformationEstimator
est
, and finally combined to form the final value of the measure.
Usage
- Use with
association
to compute aMultivariateInformationMeasure
from input data:association
(est::MIDecomposition, x...)
. - Use with some
IndependenceTest
to test for independence between variables.
Examples
- Example 1: Estimating
CMIShannon
using a decomposition intoMIShannon
terms using theKraskovStögbauerGrassberger1
mutual information estimator.
See also: MultivariateInformationMeasureEstimator
.
Associations.CMIDecomposition
— TypeCMIDecomposition(definition::MultivariateInformationMeasure,
est::ConditionalMutualInformationEstimator)
Estimate some multivariate information measure specified by definition
, by decomposing it into a combination of conditional mutual information terms.
Usage
- Use with
association
to compute aMultivariateInformationMeasure
from input data:association
(est::CMIDecomposition, x...)
. - Use with some
IndependenceTest
to test for independence between variables.
Description
Each of the conditional mutual information terms are estimated using est
, which can be any ConditionalMutualInformationEstimator
. Finally, these estimates are combined according to the relevant decomposition formula.
This estimator is similar to EntropyDecomposition
, but definition
is expressed as conditional mutual information terms instead of entropy terms.
Examples
- Example 1: Estimating
TEShannon
by decomposing it intoCMIShannon
which is estimated using theFPVP
estimator.
See also: ConditionalMutualInformationEstimator
, MultivariateInformationMeasureEstimator
, MultivariateInformationMeasure
.
Mutual information estimators
Associations.MutualInformationEstimator
— TypeMutualInformationEstimator
The supertype for dedicated MutualInformation
estimators.
Concrete implementations
Associations.KraskovStögbauerGrassberger1
— TypeKSG1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1(; k::Int = 1, w = 0, metric_marginals = Chebyshev())
The KraskovStögbauerGrassberger1
mutual information estimator (you can use KSG1
for short) is the $I^{(1)}$ k
-th nearest neighbor estimator from Kraskov et al. (2004).
Compatible definitions
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Keyword arguments
k::Int
: The number of nearest neighbors to consider. Only information about thek
-th nearest neighbor is actually used.metric_marginals
: The distance metric for the marginals for the marginals can be any metric fromDistances.jl
. It defaults tometric_marginals = Chebyshev()
, which is the same as in Kraskov et al. (2004).w::Int
: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to0
, meaning that only the point itself is excluded.
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG1(; k = 10), x, y) # should be near 0 (and can be negative)
Associations.KraskovStögbauerGrassberger2
— TypeKSG2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2(; k::Int = 1, w = 0, metric_marginals = Chebyshev())
The KraskovStögbauerGrassberger2
mutual information estimator (you can use KSG2
for short) is the $I^{(2)}$ k
-th nearest neighbor estimator from (Kraskov et al., 2004).
Compatible definitions
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Keyword arguments
k::Int
: The number of nearest neighbors to consider. Only information about thek
-th nearest neighbor is actually used.metric_marginals
: The distance metric for the marginals for the marginals can be any metric fromDistances.jl
. It defaults tometric_marginals = Chebyshev()
, which is the same as in Kraskov et al. (2004).w::Int
: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to0
, meaning that only the point itself is excluded.
Description
Let the joint StateSpaceSet $X := \{\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m \}$ be defined by the concatenation of the marginal StateSpaceSets $\{ \bf{X}_k \}_{k=1}^m$, where each $\bf{X}_k$ is potentially multivariate. Let $\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N$ be the points in the joint space $X$.
The KraskovStögbauerGrassberger2
estimator first locates, for each $\bf{x}_i \in X$, the point $\bf{n}_i \in X$, the k
-th nearest neighbor to $\bf{x}_i$, according to the maximum norm (Chebyshev
metric). Let $\epsilon_i$ be the distance $d(\bf{x}_i, \bf{n}_i)$.
Consider $x_i^m \in \bf{X}_m$, the $i$-th point in the marginal space $\bf{X}_m$. For each $\bf{x}_i^m$, we determine $\theta_i^m$ := the number of points $\bf{x}_k^m \in \bf{X}_m$ that are a distance less than $\epsilon_i$ away from $\bf{x}_i^m$. That is, we use the distance from a query point $\bf{x}_i \in X$ (in the joint space) to count neighbors of $x_i^m \in \bf{X}_m$ (in the marginal space).
Shannon mutual information between the variables $\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m$ is then estimated as
\[\hat{I}_{KSG2}(\bf{X}) = \psi{(k)} - \dfrac{m - 1}{k} + (m - 1)\psi{(N)} - \dfrac{1}{N} \sum_{i = 1}^N \sum_{j = 1}^m \psi{(\theta_i^j + 1)}\]
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(KSG2(; k = 10), x, y) # should be near 0 (and can be negative)
Associations.GaoKannanOhViswanath
— TypeGaoKannanOhViswanath <: MutualInformationEstimator
GaoKannanOhViswanath(definition = MIShannon(); k = 1, w = 0)
The GaoKannanOhViswanath
(Shannon) estimator is designed for estimating Shannon mutual information between variables that may be either discrete, continuous or a mixture of both (Gao et al., 2017).
Compatible definitions
Keyword arguments
k::Int
: The number of nearest neighbors to consider. Only information about thek
-th nearest neighbor is actually used.w::Int
: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to0
, meaning that only the point itself is excluded.
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
The estimator starts by expressing mutual information in terms of the Radon-Nikodym derivative, and then estimates these derivatives using k
-nearest neighbor distances from empirical samples.
The estimator avoids the common issue of having to add noise to data before analysis due to tied points, which may bias other estimators. Citing their paper, the estimator "strongly outperforms natural baselines of discretizing the mixed random variables (by quantization) or making it continuous by adding a small Gaussian noise."
In Gao et al. (2017), they claim (roughly speaking) that the estimator reduces to the KraskovStögbauerGrassberger1
estimator for continuous-valued data. However, KraskovStögbauerGrassberger1
uses the digamma function, while GaoKannanOhViswanath
uses the logarithm instead, so the estimators are not exactly equivalent for continuous data.
Moreover, in their algorithm 1, it is clearly not the case that the method falls back on the KraskovStögbauerGrassberger1
approach. The KraskovStögbauerGrassberger1
estimator uses k
-th neighbor distances in the joint space, while the GaoKannanOhViswanath
algorithm selects the maximum k
-th nearest distances among the two marginal spaces, which are in general not the same as the k
-th neighbor distance in the joint space (unless both marginals are univariate). Therefore, our implementation here differs slightly from algorithm 1 in GaoKannanOhViswanath
. We have modified it in a way that mimics KraskovStögbauerGrassberger1
for continous data. Note that because of using the log
function instead of digamma
, there will be slight differences between the methods. See the source code for more details.
Even though the GaoKannanOhViswanath
estimator is designed to handle discrete data, our implementation demands that all input data are StateSpaceSet
s whose data points are floats. If you have discrete data, such as strings or symbols, encode them using integers and convert those integers to floats before passing them to association
.
Examples
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoKannanOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)
Associations.GaoOhViswanath
— TypeGaoOhViswanath <: MutualInformationEstimator
GaoOhViswanath(definition = MIShannon(); k = 1, w = 0)
The GaoOhViswanath
is a mutual information estimator based on nearest neighbors, and is also called the bias-improved-KSG estimator, or BI-KSG, by (Gao et al., 2018).
Compatible definitions
Keyword arguments
k::Int
: The number of nearest neighbors to consider. Only information about thek
-th nearest neighbor is actually used.w::Int
: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to0
, meaning that only the point itself is excluded.
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
The estimator is given by
\[\begin{align*} \hat{H}_{GAO}(X, Y) &= \hat{H}_{KSG}(X) + \hat{H}_{KSG}(Y) - \hat{H}_{KZL}(X, Y) \\ &= \psi{(k)} + \log{(N)} + \log{ \left( \dfrac{c_{d_{x}, 2} c_{d_{y}, 2}}{c_{d_{x} + d_{y}, 2}} \right) } - \\ & \dfrac{1}{N} \sum_{i=1}^N \left( \log{(n_{x, i, 2})} + \log{(n_{y, i, 2})} \right) \end{align*},\]
where $c_{d, 2} = \dfrac{\pi^{\frac{d}{2}}}{\Gamma{(\dfrac{d}{2} + 1)}}$ is the volume of a $d$-dimensional unit $\mathcal{l}_2$-ball.
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaoOhViswanath(; k = 10), x, y) # should be near 0 (and can be negative)
Associations.GaussianMI
— TypeGaussianMI <: MutualInformationEstimator
GaussianMI(; normalize::Bool = false)
GaussianMI
is a parametric estimator for Shannon mutual information.
Compatible definitions
Usage
- Use with
association
to compute Shannon mutual information from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
Given $d_x$-dimensional and $d_y$-dimensional input data X
and Y
, GaussianMI
first constructs the $d_x + d_y$-dimensional joint StateSpaceSet
XY
. If normalize == true
, then we follow the approach in Vejmelka & Palus (2008)(Vejmelka and Paluš, 2008) and transform each column in XY
to have zero mean and unit standard deviation. If normalize == false
, then the algorithm proceeds without normalization.
Next, the C_{XY}
, the correlation matrix for the (normalized) joint data XY
is computed. The mutual information estimate GaussianMI
assumes the input variables are distributed according to normal distributions with zero means and unit standard deviations. Therefore, given $d_x$-dimensional and $d_y$-dimensional input data X
and Y
, GaussianMI
first constructs the joint StateSpaceSet
XY
, then transforms each column in XY
to have zero mean and unit standard deviation, and finally computes the \Sigma
, the correlation matrix for XY
.
The mutual information estimated (for normalize == false
) is then estimated as
\[\hat{I}^S_{Gaussian}(X; Y) = \dfrac{1}{2} \dfrac{ \det(\Sigma_X) \det(\Sigma_Y)) }{\det(\Sigma))},\]
where we $\Sigma_X$ and $\Sigma_Y$ appear in $\Sigma$ as
\[\Sigma = \begin{bmatrix} \Sigma_{X} & \Sigma^{'}\\ \Sigma^{'} & \Sigma_{Y} \end{bmatrix}.\]
If normalize == true
, then the mutual information is estimated as
\[\hat{I}^S_{Gaussian}(X; Y) = -\dfrac{1}{2} \sum_{i = 1}^{d_x + d_y} \sigma_i,\]
where $\sigma_i$ are the eigenvalues for $\Sigma$.
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000); y = rand(rng, 10000)
association(GaussianMI(), x, y) # should be near 0 (and can be negative)
Conditional mutual information estimators
Associations.ConditionalMutualInformationEstimator
— TypeConditionalMutualInformationEstimator
The supertype for dedicated ConditionalMutualInformation
estimators.
Concrete implementations
Associations.GaussianCMI
— TypeGaussianCMI <: MutualInformationEstimator
GaussianCMI(definition = CMIShannon(); normalize::Bool = false)
GaussianCMI
is a parametric ConditionalMutualInformationEstimator
(Vejmelka and Paluš, 2008).
Compatible definitions
Usage
- Use with
association
to computeCMIShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
GaussianCMI
estimates Shannon CMI through a sum of two mutual information terms that each are estimated using GaussianMI
(the normalize
keyword is the same as for GaussianMI
):
\[\hat{I}_{Gaussian}(X; Y | Z) = \hat{I}_{Gaussian}(X; Y, Z) - \hat{I}_{Gaussian}(X; Z)\]
Examples
- Example 1. Estimating
CMIShannon
.
Associations.FPVP
— TypeFPVP <: ConditionalMutualInformationEstimator
FPVP(definition = CMIShannon(); k = 1, w = 0)
The Frenzel-Pompe-Vejmelka-Paluš (or FPVP
for short) ConditionalMutualInformationEstimator
is used to estimate the conditional mutual information using a k
-th nearest neighbor approach that is analogous to that of the KraskovStögbauerGrassberger1
mutual information estimator from Frenzel and Pompe (2007) and Vejmelka and Paluš (2008).
k
is the number of nearest neighbors. w
is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Compatible definitions
Usage
- Use with
association
to computeConditionalMutualInformation
measure from input data. - Use with some
IndependenceTest
to test for independence between variables.
Examples
- Example 1: Estimating
CMIShannon
Associations.MesnerShalizi
— TypeMesnerShalizi <: ConditionalMutualInformationEstimator
MesnerShalizi(definition = CMIShannon(); k = 1, w = 0)
The MesnerShalizi
ConditionalMutualInformationEstimator
is designed for data that can be mixtures of discrete and continuous data (Mesner and Shalizi, 2020).
k
is the number of nearest neighbors. w
is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Compatible definitions
Usage
- Use with
association
to computeCMIShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Examples
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(MesnerShalizi(; k = 10), x, z, y) # should be near 0 (and can be negative)
Associations.Rahimzamani
— TypeRahimzamani <: ConditionalMutualInformationEstimator
Rahimzamani(k = 1, w = 0)
The Rahimzamani
ConditionalMutualInformationEstimator
is designed for data that can be mixtures of discrete and continuous data (Rahimzamani et al., 2018).
Compatible definitions
Usage
- Use with
association
to compute aCMIShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
This estimator is very similar to the GaoKannanOhViswanath
mutual information estimator, but has been expanded to the conditional mutual information case.
k
is the number of nearest neighbors. w
is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Examples
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(Rahimzamani(; k = 10), x, z, y) # should be near 0 (and can be negative)
Associations.PoczosSchneiderCMI
— TypePoczosSchneiderCMI <: ConditionalMutualInformationEstimator
PoczosSchneiderCMI(definition = CMIRenyiPoczos(); k = 1, w = 0)
The PoczosSchneiderCMI
ConditionalMutualInformationEstimator
computes conditional mutual informations using a k
-th nearest neighbor approach (Póczos and Schneider, 2012).
k
is the number of nearest neighbors. w
is the Theiler window, which controls the number of temporal neighbors that are excluded during neighbor searches.
Compatible definitions
Usage
- Use with
association
to computeCMIRenyiPoczos
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Examples
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
association(PoczosSchneiderCMI(CMIRenyiPoczos(), k = 10), x, z, y) # should be near 0 (and can be negative)
Transfer entropy estimators
Associations.TransferEntropyEstimator
— TypeThe supertype of all dedicated transfer entropy estimators.
Associations.Zhu1
— TypeZhu1 <: TransferEntropyEstimator
Zhu1(k = 1, w = 0, base = MathConstants.e)
The Zhu1
transfer entropy estimator (Zhu et al., 2015) for normalized input data (as described in Zhu et al. (2015)) for both for pairwise and conditional transfer entropy.
Compatible definitions
Usage
- Use with
association
to computeTEShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Description
This estimator approximates probabilities within hyperrectangles surrounding each point xᵢ ∈ x
using using k
nearest neighbor searches. However, it also considers the number of neighbors falling on the borders of these hyperrectangles. This estimator is an extension to the entropy estimator in Singh et al. (2003).
w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
Description
For a given points in the joint embedding space jᵢ
, this estimator first computes the distance dᵢ
from jᵢ
to its k
-th nearest neighbor. Then, for each point mₖ[i]
in the k
-th marginal space, it counts the number of points within radius dᵢ
.
The Shannon transfer entropy is then computed as
\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]
where the index k
references the three marginal subspaces T
, TTf
and ST
for which neighbor searches are performed. Here this estimator has been modified to allow for conditioning too (a simple modification to Lindner et al. (2011)'s equation 5 and 6).
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Zhu1(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)
Associations.Lindner
— TypeLindner <: TransferEntropyEstimator
Lindner(definition = Shannon(); k = 1, w = 0, base = 2)
The Lindner
transfer entropy estimator (Lindner et al., 2011), which is also used in the Trentool MATLAB toolbox, and is based on nearest neighbor searches.
Compatible definitions
Usage
- Use with
association
to computeTEShannon
from input data. - Use with some
IndependenceTest
to test for independence between variables.
Keyword parameters
w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
The estimator can be used both for pairwise and conditional transfer entropy estimation.
Description
For a given points in the joint embedding space jᵢ
, this estimator first computes the distance dᵢ
from jᵢ
to its k
-th nearest neighbor. Then, for each point mₖ[i]
in the k
-th marginal space, it counts the number of points within radius dᵢ
.
The Shannon transfer entropy is then computed as
\[TE_S(X \to Y) = \psi(k) + \dfrac{1}{N} \sum_{i}^n \left[ \sum_{k=1}^3 \left( \psi(m_k[i] + 1) \right) \right],\]
where the index k
references the three marginal subspaces T
, TTf
and ST
for which neighbor searches are performed. Here this estimator has been modified to allow for conditioning too (a simple modification to Lindner et al. (2011)'s equation 5 and 6).
Example
using Associations
using Random; rng = MersenneTwister(1234)
x = rand(rng, 10000)
y = rand(rng, 10000) .+ x
z = rand(rng, 10000) .+ y
est = Lindner(TEShannon(), k = 10)
association(est, x, z, y) # should be near 0 (and can be negative)
Associations.SymbolicTransferEntropy
— TypeSymbolicTransferEntropy <: TransferEntropyEstimator
SymbolicTransferEntropy(definition = TEShannon(); m = 3, τ = 1,
lt = ComplexityMeasures.isless_rand
A convenience estimator for symbolic transfer entropy (Staniek and Lehnertz, 2008).
Compatible measures
Description
Symbolic transfer entropy consists of two simple steps. First, the input time series are encoded using codify
with the CodifyVariables
discretization and the OrdinalPatterns
outcome space. This transforms the input time series into integer time series. Transfer entropy entropy is then estimated from the encoded time series by applying
Transfer entropy is then estimated as usual on the encoded timeseries with the embedding dictated by definition
and the JointProbabilities
estimator.
Examples
Associations.Hilbert
— TypeHilbert(est;
source::InstantaneousSignalProperty = Phase(),
target::InstantaneousSignalProperty = Phase(),
cond::InstantaneousSignalProperty = Phase())
) <: TransferDifferentialEntropyEstimator
Compute transfer entropy on instantaneous phases/amplitudes of relevant signals, which are obtained by first applying the Hilbert transform to each signal, then extracting the phases/amplitudes of the resulting complex numbers (Paluš, 2014). Original time series are thus transformed to instantaneous phase/amplitude time series. Transfer entropy is then estimated using the provided est
on those phases/amplitudes (use e.g. ValueBinning
, or OrdinalPatterns
).
Details on estimation of the transfer entropy (conditional mutual information) following the phase/amplitude extraction step is not given in Palus (2014). Here, after instantaneous phases/amplitudes have been obtained, these are treated as regular time series, from which transfer entropy is then computed as usual.
Associations.Phase
— TypePhase <: InstantaneousSignalProperty
Indicates that the instantaneous phases of a signal should be used.
Associations.Amplitude
— TypeAmplitude <: InstantaneousSignalProperty
Indicates that the instantaneous amplitudes of a signal should be used.
A small tutorial
Associations.jl extends the single-variate information API in ComplexityMeasures.jl to information measures of multiple variables.
Definitions
We define "information measure" as some functional of probability mass functions or probability densities. This definition may or may not agree with literature usage, depending on the context. We made this choice pragmatically based on user-friendlyness and coding-friendlyness, but still trying to maintain some level of meaningful terminology.
Upon doing a literature review on the possible variants of information theoretic measures, it become painstakingly obvious that authors use the same name for different concepts. For novices, and experienced practitioners too, this can be confusing. Our API clearly distinguishes between methods that are conceptually the same but named differently in the literature due to differing estimation strategies, from methods that actually have different definitions.
- Multiple, equivalent definitions occur for example for the Shannon mutual information (MI;
MIShannon
), which has both a discrete and continuous version, and there there are multiple equivalent mathematical formulas for them: a direct sum/integral over a joint probability mass function (pmf), as a sum of three entropy terms, and as a Kullback-Leibler divergence between the joint pmf and the product of the marginal distributions. Since these definitions are all equivalent, we only need once type (MIShannon
) to represent them. - But Shannon MI is not the only type of mutual information! For example, "Tsallis mutual information" has been proposed in different variants by various authors. Despite sharing the same name, these are actually nonequivalent definitions. We've thus assigned them entirely different measure names (e.g.
MITsallisFuruichi
andMITsallisMartin
), with the author name at the end.
Basic estimation strategy
To estimate a multivariate information measure in practice, you must first specify the definition of the measure, which is then used as input to an estimator of that measure. This estimator is then given to association
. Every information measure has at least one estimator: JointProbabilities
. Many measures have several additional estimators, and an overview can be found in the docstring for MultivariateInformationMeasureEstimator
.
Distances/divergences
There are many information measures in the literature that aim to quantify the distance/divergence between two probability mass functions (pmf) or densities. You can find those that we implement here.
As an example, let's quantify the KLDivergence
between two probability mass functions estimated by symbolizing two input vectors x
and y
using OrdinalPatterns
. Since the discrete KLDivergence
can be expressed as a function of a joint pmf, we can use the JointProbabilities
estimator.
using Associations
using Random; rng = MersenneTwister(1234)
x, y = rand(rng, 1000), rand(rng, 1000)
# Specify a discretization. We discretize per column.
disc = CodifyVariables(OrdinalPatterns(m=2))
def = KLDivergence()
est = JointProbabilities(def, disc)
association(est, x, y) # should be close to 0
4.625912964358308e-5
Divergences are examples of asymmetric information measures, which we can see by flipping the order of the input data.
association(est, y, x)
4.625888243106873e-5
Conditional entropies
Conditional entropies are another example of asymmetric information measures. They all have in common that they are functions of a joint pmf, and can therefore also be estimated using the JointProbabilities
estimator. This time, we'll use a rectangular binning with 3 bins along each dimension to discretize the data.
using Associations
using Random; rng = Xoshiro(1234)
x, y = randn(rng, 1000), randn(rng, 1000)
disc = CodifyVariables(ValueBinning(3))
def = ConditionalEntropyShannon(base = 2)
est = JointProbabilities(def, disc)
association(est, x, y)
1.17479123360062
Joint entropies
Joint entropies, on the other hand, are symmetric. Joint entropies are functionals of a joint pmf, so we can still use the JointProbabilities
estimator. This time, we use a Dispersion
based discretization.
using Associations
using Random; rng = Xoshiro(1234)
x, y = randn(rng, 1000), randn(rng, 1000)
disc = CodifyVariables(Dispersion())
est = JointProbabilities(JointEntropyShannon(base = 2), disc)
association(est, x, y) ≈ association(est, y, x) # should be true
true
Mutual informations
Mutual informations, in particular MIShannon
is an often-used symmetric measure for quantifing the (possibly nonlinear) association between variables. It appears in both discrete and differential form, and can be estimated in a multitude of ways. For example, one can use dedicated MutualInformationEstimator
s such as KraskovStögbauerGrassberger2
or GaussianMI
:
using Associations
using StateSpaceSets
# We'll construct two state space sets, to illustrate how we can discretize
# multidimensional inputs using `CodifyPoints`.
x, y = StateSpaceSet(rand(rng, 1000, 2)), StateSpaceSet(rand(rng, 1000, 2))
est = KSG1(MIShannon(base = 2), k = 10)
association(est, x, y)
-0.021186834467335727
The result should be symmetric:
association(est, x, y) ≈ association(est, y, x) # should be true
true
One can also estimate mutual information using the EntropyDecomposition
estimator, or (like above) using the JointProbabilities
estimator. Let's construct a differential entropy based estimator based on the Kraskov
estimator.
est_diff = EntropyDecomposition(MIShannon(base = 2), Kraskov(Shannon(), k=10))
EntropyDecomposition{MIShannon{Int64}, Kraskov{Shannon{Int64}, Int64}, Nothing, Nothing}(MIShannon{Int64}(2), Kraskov(definition = Shannon(base = 2), k = 10, w = 0, base = 2), nothing, nothing)
association(est_diff, x, y)
-0.3826780635557282
We can also construct a discrete entropy based estimator based on e.g. PlugIn
estimator of Shannon
entropy.
# We know that `x` and `y` were generated from a uniform distribution above,
# so we set the minimum and maximum values of the encoding to 0 and 1,
# respectively.
encoding = RelativeMeanEncoding(0.0, 1.0; n = 4)
disc = CodifyPoints(encoding)
est_disc = JointProbabilities(MIShannon(base = 2), disc)
association(est_disc, x, y)
0.00494036679990649
JointProbabilities: fine grained discretization control
For numerical data, we can estimate both counts and probabilities using CodifyVariables
with any count-based OutcomeSpace
. Here, we'll estimate MIShannon
using one type of encoding for the first variable, and another type of encoding for the second variable.
using Associations
using Random; rng = Xoshiro(1234)
x, y = rand(rng, 100), rand(rng, 100)
# We must use outcome spaces with the same number of total outcomes.
# This works becuase these outcome spaces construct embedding points
# ("windows") in the same way/order; be careful if that isn't the case!
ox = CosineSimilarityBinning(nbins = factorial(3))
oy = OrdinalPatterns(m = 3)
# Now estimate mutual information
discretization = CodifyVariables((ox, oy))
est = JointProbabilities(MIShannon(), discretization)
association(est, x, y)
0.10752906020545049
For more fine-grained control than CodifyVariables
can offer, we can use CodifyPoints
with one or several Encoding
s. Here's how we can estimate MIShannon
one multivariate input data by discretizing each input variable in arbitrary ways.
using Associations
using Random; rng = Xoshiro(1234)
x, y = StateSpaceSet(rand(rng, 1000, 2)), StateSpaceSet(rand(rng, 1000, 3))
# min/max of the `rand` call is 0 and 1
precise = true # precise bin edges
r = range(0, 1; length = 3)
binning = FixedRectangularBinning(r, dimension(x), precise)
encoding_x = RectangularBinEncoding(binning, x)
encoding_y = CombinationEncoding(RelativeMeanEncoding(0.0, 1, n = 2), OrdinalPatternEncoding(3))
discretization = CodifyPoints(encoding_x, encoding_y)
# Now estimate mutual information
est = JointProbabilities(MIShannon(), discretization)
association(est, x, y)
0.025991777755365167