Mutual information API
The mutual information (MI) API is defined by
MutualInformation
, and its subtypes.mutualinfo
,MutualInformationEstimator
, and its subtypes.
CausalityTools.mutualinfo
— Functionmutualinfo([measure::MutualInformation], m::ContingencyMatrix) → mi::Real
Estimate the mutual information between x
and y
, the variables corresponding to the columns and rows of the 2-dimensional contingency matrix m
, respectively.
Estimates the discrete version of the given MutualInformation
measure
from its direct definition (double-sum), using the probabilities from a pre-computed ContingencyMatrix
. If measure
is not given, then the default is MIShannon()
.
mutualinfo([measure::MutualInformation], est::ProbabilitiesEstimator, x, y) → mi::Real ∈ [0, a]
Estimate the mutual information between x
and y
using the discrete version of the given measure
, using the given ProbabilitiesEstimator
est
(which must accept multivariate data and have an implementation for marginal_encodings
). See examples here. If measure
is not given, then the default is MIShannon()
.
Estimators
The mutual information is computed as sum of three entropy terms, without any bias correction. The exception is when using Contingency
; then the mutual information is computed using a ContingencyMatrix
.
Joint and marginal probabilities are computed by jointly discretizing x
and y
using the approach given by est
(using marginal_encodings
), and obtaining marginal distributions from the joint distribution.
Estimator | Principle | MIShannon | MITsallisFuruichi | MITsallisMartin | MIRenyiJizba | MIRenyiSarbu |
---|---|---|---|---|---|---|
Contingency | Contingency table | ✓ | ✓ | ✓ | ✓ | ✓ |
CountOccurrences | Frequencies | ✓ | ✓ | ✓ | ✓ | ✖ |
ValueHistogram | Binning (histogram) | ✓ | ✓ | ✓ | ✓ | ✖ |
SymbolicPermutation | Ordinal patterns | ✓ | ✓ | ✓ | ✓ | ✖ |
Dispersion | Dispersion patterns | ✓ | ✓ | ✓ | ✓ | ✖ |
mutualinfo([measure::MutualInformation], est::DifferentialEntropyEstimator, x, y)
Estimate the mutual information measure
between x
and y
by a sum of three entropy terms, without any bias correction, using any DifferentialEntropyEstimator
compatible with multivariate data. See examples here. If measure
is not given, then the default is MIShannon()
.
DifferentialEntropyEstimator
s have their own base
field which is not used here. Instead, this method creates a copy of est
internally, where est.base
is replaced by measure.e.base
. Therefore, use measure
to control the "unit" of the mutual information.
Estimators
Some MutualInformation
measures can be computed using a DifferentialEntropyEstimator
, provided it supports multivariate input data. These estimators compute mutual information as a sum of of entropy terms (with different dimensions), without any bias correction.
Estimator | Principle | MIShannon | MITsallisFuruichi | MITsallisMartin | MIRenyiJizba | MIRenyiSurbu |
---|---|---|---|---|---|---|
Kraskov | Nearest neighbors | ✓ | x | x | x | x |
Zhu | Nearest neighbors | ✓ | x | x | x | x |
ZhuSingh | Nearest neighbors | ✓ | x | x | x | x |
Gao | Nearest neighbors | ✓ | x | x | x | x |
Goria | Nearest neighbors | ✓ | x | x | x | x |
Lord | Nearest neighbors | ✓ | x | x | x | x |
LeonenkoProzantoSavani | Nearest neighbors | ✓ | x | x | x | x |
mutualinfo([measure::MutualInformation], est::MutualInformationEstimator, x, y)
Estimate the mutual information measure
between x
and y
using the dedicated MutualInformationEstimator
est
. See examples here. If measure
is not given, then the default is MIShannon()
.
Estimators
Dedicated MutualInformationEstimator
s are either discrete, continuous, or a mixture of both. Typically, these estimators apply bias correction.
Estimator | Type | MIShannon |
---|---|---|
GaussanMI | Parametric | ✓ |
KSG1 | Continuous | ✓ |
KSG2 | Continuous | ✓ |
GaoKannanOhViswanath | Mixed | ✓ |
GaoOhViswanath | Continuous | ✓ |
Definitions
CausalityTools.MutualInformation
— TypeMutualInformation <: AssociationMeasure
The supertype of all mutual information measures. Concrete subtypes are
MutualInformationEstimator
s
CausalityTools.MutualInformationEstimator
— TypeMutualInformationEstimator
The supertype of all dedicated mutual information estimators.
MutualInformationEstimator
s can be either mixed, discrete or a combination of both. Each estimator uses a specialized technique to approximate relevant densities/integrals and/or probabilities, and is typically tailored to a specific type of MutualInformation
(mostly MIShannon
).
GaussianMI
(parametric)
CausalityTools.GaussianMI
— TypeGaussianMI <: MutualInformationEstimator
GaussianMI(; normalize::Bool = false)
GaussianMI
is a parametric estimator for Shannon mutual information.
Description
Given $d_x$-dimensional and $d_y$-dimensional input data X
and Y
, GaussianMI
first constructs the $d_x + d_y$-dimensional joint StateSpaceSet
XY
. If normalize == true
, then we follow the approach in Vejmelka & Palus (2008)(Vejmelka and Paluš, 2008) and transform each column in XY
to have zero mean and unit standard deviation. If normalize == false
, then the algorithm proceeds without normalization.
Next, the C_{XY}
, the correlation matrix for the (normalized) joint data XY
is computed. The mutual information estimate GaussianMI
assumes the input variables are distributed according to normal distributions with zero means and unit standard deviations. Therefore, given $d_x$-dimensional and $d_y$-dimensional input data X
and Y
, GaussianMI
first constructs the joint StateSpaceSet
XY
, then transforms each column in XY
to have zero mean and unit standard deviation, and finally computes the \Sigma
, the correlation matrix for XY
.
The mutual information estimated (for normalize == false
) is then estimated as
\[\hat{I}^S_{Gaussian}(X; Y) = \dfrac{1}{2} \dfrac{ \det(\Sigma_X) \det(\Sigma_Y)) }{\det(\Sigma))},\]
where we $\Sigma_X$ and $\Sigma_Y$ appear in $\Sigma$ as
\[\Sigma = \begin{bmatrix} \Sigma_{X} & \Sigma^{'}\\ \Sigma^{'} & \Sigma_{Y} \end{bmatrix}.\]
If normalize == true
, then the mutual information is estimated as
\[\hat{I}^S_{Gaussian}(X; Y) = -\dfrac{1}{2} \sum_{i = 1}^{d_x + d_y} \sigma_i,\]
where $\sigma_i$ are the eigenvalues for $\Sigma$.
KraskovStögbauerGrassberger1
CausalityTools.KraskovStögbauerGrassberger1
— TypeKSG1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1(; k::Int = 1, w = 0, metric_marginals = Chebyshev())
The KraskovStögbauerGrassberger1
mutual information estimator (you can use KSG1
for short) is the $I^{(1)}$ k
-th nearest neighbor estimator from Kraskov et al. (2004).
Keyword arguments
k::Int
: The number of nearest neighbors to consider. Only information about thek
-th nearest neighbor is actually used.metric_marginals
: The distance metric for the marginals for the marginals can be any metric fromDistances.jl
. It defaults tometric_marginals = Chebyshev()
, which is the same as in Kraskov et al. (2004).w::Int
: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to0
, meaning that only the point itself is excluded.
Description
Let the joint StateSpaceSet $X := \{\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m \}$ be defined by the concatenation of the marginal StateSpaceSets $\{ \bf{X}_k \}_{k=1}^m$, where each $\bf{X}_k$ is potentially multivariate. Let $\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N$ be the points in the joint space $X$.
KraskovStögbauerGrassberger2
CausalityTools.KraskovStögbauerGrassberger2
— TypeKSG2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2(; k::Int = 1, w = 0, metric_marginals = Chebyshev())
The KraskovStögbauerGrassberger2
mutual information estimator (you can use KSG2
for short) is the $I^{(2)}$ k
-th nearest neighbor estimator from (Kraskov et al., 2004).
Keyword arguments
k::Int
: The number of nearest neighbors to consider. Only information about thek
-th nearest neighbor is actually used.metric_marginals
: The distance metric for the marginals for the marginals can be any metric fromDistances.jl
. It defaults tometric_marginals = Chebyshev()
, which is the same as in Kraskov et al. (2004).w::Int
: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to0
, meaning that only the point itself is excluded.
Description
Let the joint StateSpaceSet $X := \{\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m \}$ be defined by the concatenation of the marginal StateSpaceSets $\{ \bf{X}_k \}_{k=1}^m$, where each $\bf{X}_k$ is potentially multivariate. Let $\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N$ be the points in the joint space $X$.
The KraskovStögbauerGrassberger2
estimator first locates, for each $\bf{x}_i \in X$, the point $\bf{n}_i \in X$, the k
-th nearest neighbor to $\bf{x}_i$, according to the maximum norm (Chebyshev
metric). Let $\epsilon_i$ be the distance $d(\bf{x}_i, \bf{n}_i)$.
Consider $x_i^m \in \bf{X}_m$, the $i$-th point in the marginal space $\bf{X}_m$. For each $\bf{x}_i^m$, we determine $\theta_i^m$ := the number of points $\bf{x}_k^m \in \bf{X}_m$ that are a distance less than $\epsilon_i$ away from $\bf{x}_i^m$. That is, we use the distance from a query point $\bf{x}_i \in X$ (in the joint space) to count neighbors of $x_i^m \in \bf{X}_m$ (in the marginal space).
Mutual information between the variables $\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m$ is then estimated as
\[\hat{I}_{KSG2}(\bf{X}) = \psi{(k)} - \dfrac{m - 1}{k} + (m - 1)\psi{(N)} - \dfrac{1}{N} \sum_{i = 1}^N \sum_{j = 1}^m \psi{(\theta_i^j + 1)}\]
GaoKannanOhViswanath
CausalityTools.GaoKannanOhViswanath
— TypeGaoKannanOhViswanath <: MutualInformationEstimator
GaoKannanOhViswanath(; k = 1, w = 0)
The GaoKannanOhViswanath
(Shannon) estimator is designed for estimating mutual information between variables that may be either discrete, continuous or a mixture of both (Gao et al., 2017).
Even though the GaoKannanOhViswanath
estimator is designed to handle discrete data, our implementation demands that all input data are StateSpaceSet
s whose data points are floats. If you have discrete data, such as strings or symbols, encode them using integers and convert those integers to floats before passing them to mutualinfo
.
Description
The estimator starts by expressing mutual information in terms of the Radon-Nikodym derivative, and then estimates these derivatives using k
-nearest neighbor distances from empirical samples.
The estimator avoids the common issue of having to add noise to data before analysis due to tied points, which may bias other estimators. Citing their paper, the estimator "strongly outperforms natural baselines of discretizing the mixed random variables (by quantization) or making it continuous by adding a small Gaussian noise."
In Gao et al. (2017), they claim (roughly speaking) that the estimator reduces to the KraskovStögbauerGrassberger1
estimator for continuous-valued data. However, KraskovStögbauerGrassberger1
uses the digamma function, while GaoKannanOhViswanath
uses the logarithm instead, so the estimators are not exactly equivalent for continuous data.
Moreover, in their algorithm 1, it is clearly not the case that the method falls back on the KSG1
approach. The KSG1
estimator uses k
-th neighbor distances in the joint space, while the GaoKannanOhViswanath
algorithm selects the maximum k
-th nearest distances among the two marginal spaces, which are in general not the same as the k
-th neighbor distance in the joint space (unless both marginals are univariate). Therefore, our implementation here differs slightly from algorithm 1 in GaoKannanOhViswanath
. We have modified it in a way that mimics KraskovStögbauerGrassberger1
for continous data. Note that because of using the log
function instead of digamma
, there will be slight differences between the methods. See the source code for more details.
See also: mutualinfo
.
GaoOhViswanath
CausalityTools.GaoOhViswanath
— TypeGaoOhViswanath <: MutualInformationEstimator
The GaoOhViswanath
mutual information estimator, also called the bias-improved-KSG estimator, or BI-KSG, by (Gao et al., 2018), is given by
\[\begin{align*} \hat{H}_{GAO}(X, Y) &= \hat{H}_{KSG}(X) + \hat{H}_{KSG}(Y) - \hat{H}_{KZL}(X, Y) \\ &= \psi{(k)} + \log{(N)} + \log{ \left( \dfrac{c_{d_{x}, 2} c_{d_{y}, 2}}{c_{d_{x} + d_{y}, 2}} \right) } - \\ & \dfrac{1}{N} \sum_{i=1}^N \left( \log{(n_{x, i, 2})} + \log{(n_{y, i, 2})} \right) \end{align*},\]
where $c_{d, 2} = \dfrac{\pi^{\frac{d}{2}}}{\Gamma{(\dfrac{d}{2} + 1)}}$ is the volume of a $d$-dimensional unit $\mathcal{l}_2$-ball.