Mutual information API

The mutual information (MI) API is defined by

CausalityTools.mutualinfoFunction
mutualinfo([measure::MutualInformation], m::ContingencyMatrix) → mi::Real

Estimate the mutual information between x and y, the variables corresponding to the columns and rows of the 2-dimensional contingency matrix m, respectively.

Estimates the discrete version of the given MutualInformation measure from its direct definition (double-sum), using the probabilities from a pre-computed ContingencyMatrix. If measure is not given, then the default is MIShannon().

source
mutualinfo([measure::MutualInformation], est::ProbabilitiesEstimator, x, y) → mi::Real ∈ [0, a]

Estimate the mutual information between x and y using the discrete version of the given measure, using the given ProbabilitiesEstimator est (which must accept multivariate data and have an implementation for marginal_encodings). See examples here. If measure is not given, then the default is MIShannon().

Estimators

The mutual information is computed as sum of three entropy terms, without any bias correction. The exception is when using Contingency; then the mutual information is computed using a ContingencyMatrix.

Joint and marginal probabilities are computed by jointly discretizing x and y using the approach given by est (using marginal_encodings), and obtaining marginal distributions from the joint distribution.

EstimatorPrincipleMIShannonMITsallisFuruichiMITsallisMartinMIRenyiJizbaMIRenyiSarbu
ContingencyContingency table
CountOccurrencesFrequencies
ValueHistogramBinning (histogram)
SymbolicPermutationOrdinal patterns
DispersionDispersion patterns
source
mutualinfo([measure::MutualInformation], est::DifferentialEntropyEstimator, x, y)

Estimate the mutual information measure between x and y by a sum of three entropy terms, without any bias correction, using any DifferentialEntropyEstimator compatible with multivariate data. See examples here. If measure is not given, then the default is MIShannon().

Note

DifferentialEntropyEstimators have their own base field which is not used here. Instead, this method creates a copy of est internally, where est.base is replaced by measure.e.base. Therefore, use measure to control the "unit" of the mutual information.

Estimators

Some MutualInformation measures can be computed using a DifferentialEntropyEstimator, provided it supports multivariate input data. These estimators compute mutual information as a sum of of entropy terms (with different dimensions), without any bias correction.

EstimatorPrincipleMIShannonMITsallisFuruichiMITsallisMartinMIRenyiJizbaMIRenyiSurbu
KraskovNearest neighborsxxxx
ZhuNearest neighborsxxxx
ZhuSinghNearest neighborsxxxx
GaoNearest neighborsxxxx
GoriaNearest neighborsxxxx
LordNearest neighborsxxxx
LeonenkoProzantoSavaniNearest neighborsxxxx
source
mutualinfo([measure::MutualInformation], est::MutualInformationEstimator, x, y)

Estimate the mutual information measure between x and y using the dedicated MutualInformationEstimator est. See examples here. If measure is not given, then the default is MIShannon().

Estimators

Dedicated MutualInformationEstimators are either discrete, continuous, or a mixture of both. Typically, these estimators apply bias correction.

EstimatorTypeMIShannon
GaussanMIParametric
KSG1Continuous
KSG2Continuous
GaoKannanOhViswanathMixed
GaoOhViswanathContinuous
source

Definitions

MutualInformationEstimators

GaussianMI (parametric)

CausalityTools.GaussianMIType
GaussianMI <: MutualInformationEstimator
GaussianMI(; normalize::Bool = false)

GaussianMI is a parametric estimator for Shannon mutual information.

Description

Given $d_x$-dimensional and $d_y$-dimensional input data X and Y, GaussianMI first constructs the $d_x + d_y$-dimensional joint StateSpaceSet XY. If normalize == true, then we follow the approach in Vejmelka & Palus (2008)(Vejmelka and Paluš, 2008) and transform each column in XY to have zero mean and unit standard deviation. If normalize == false, then the algorithm proceeds without normalization.

Next, the C_{XY}, the correlation matrix for the (normalized) joint data XY is computed. The mutual information estimate GaussianMI assumes the input variables are distributed according to normal distributions with zero means and unit standard deviations. Therefore, given $d_x$-dimensional and $d_y$-dimensional input data X and Y, GaussianMI first constructs the joint StateSpaceSet XY, then transforms each column in XY to have zero mean and unit standard deviation, and finally computes the \Sigma, the correlation matrix for XY.

The mutual information estimated (for normalize == false) is then estimated as

\[\hat{I}^S_{Gaussian}(X; Y) = \dfrac{1}{2} \dfrac{ \det(\Sigma_X) \det(\Sigma_Y)) }{\det(\Sigma))},\]

where we $\Sigma_X$ and $\Sigma_Y$ appear in $\Sigma$ as

\[\Sigma = \begin{bmatrix} \Sigma_{X} & \Sigma^{'}\\ \Sigma^{'} & \Sigma_{Y} \end{bmatrix}.\]

If normalize == true, then the mutual information is estimated as

\[\hat{I}^S_{Gaussian}(X; Y) = -\dfrac{1}{2} \sum_{i = 1}^{d_x + d_y} \sigma_i,\]

where $\sigma_i$ are the eigenvalues for $\Sigma$.

source

KraskovStögbauerGrassberger1

CausalityTools.KraskovStögbauerGrassberger1Type
KSG1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1 <: MutualInformationEstimator
KraskovStögbauerGrassberger1(; k::Int = 1, w = 0, metric_marginals = Chebyshev())

The KraskovStögbauerGrassberger1 mutual information estimator (you can use KSG1 for short) is the $I^{(1)}$ k-th nearest neighbor estimator from Kraskov et al. (2004).

Keyword arguments

  • k::Int: The number of nearest neighbors to consider. Only information about the k-th nearest neighbor is actually used.
  • metric_marginals: The distance metric for the marginals for the marginals can be any metric from Distances.jl. It defaults to metric_marginals = Chebyshev(), which is the same as in Kraskov et al. (2004).
  • w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to 0, meaning that only the point itself is excluded.

Description

Let the joint StateSpaceSet $X := \{\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m \}$ be defined by the concatenation of the marginal StateSpaceSets $\{ \bf{X}_k \}_{k=1}^m$, where each $\bf{X}_k$ is potentially multivariate. Let $\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N$ be the points in the joint space $X$.

source

KraskovStögbauerGrassberger2

CausalityTools.KraskovStögbauerGrassberger2Type
KSG2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2 <: MutualInformationEstimator
KraskovStögbauerGrassberger2(; k::Int = 1, w = 0, metric_marginals = Chebyshev())

The KraskovStögbauerGrassberger2 mutual information estimator (you can use KSG2 for short) is the $I^{(2)}$ k-th nearest neighbor estimator from (Kraskov et al., 2004).

Keyword arguments

  • k::Int: The number of nearest neighbors to consider. Only information about the k-th nearest neighbor is actually used.
  • metric_marginals: The distance metric for the marginals for the marginals can be any metric from Distances.jl. It defaults to metric_marginals = Chebyshev(), which is the same as in Kraskov et al. (2004).
  • w::Int: The Theiler window, which determines if temporal neighbors are excluded during neighbor searches in the joint space. Defaults to 0, meaning that only the point itself is excluded.

Description

Let the joint StateSpaceSet $X := \{\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m \}$ be defined by the concatenation of the marginal StateSpaceSets $\{ \bf{X}_k \}_{k=1}^m$, where each $\bf{X}_k$ is potentially multivariate. Let $\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N$ be the points in the joint space $X$.

The KraskovStögbauerGrassberger2 estimator first locates, for each $\bf{x}_i \in X$, the point $\bf{n}_i \in X$, the k-th nearest neighbor to $\bf{x}_i$, according to the maximum norm (Chebyshev metric). Let $\epsilon_i$ be the distance $d(\bf{x}_i, \bf{n}_i)$.

Consider $x_i^m \in \bf{X}_m$, the $i$-th point in the marginal space $\bf{X}_m$. For each $\bf{x}_i^m$, we determine $\theta_i^m$ := the number of points $\bf{x}_k^m \in \bf{X}_m$ that are a distance less than $\epsilon_i$ away from $\bf{x}_i^m$. That is, we use the distance from a query point $\bf{x}_i \in X$ (in the joint space) to count neighbors of $x_i^m \in \bf{X}_m$ (in the marginal space).

Mutual information between the variables $\bf{X}_1, \bf{X_2}, \ldots, \bf{X}_m$ is then estimated as

\[\hat{I}_{KSG2}(\bf{X}) = \psi{(k)} - \dfrac{m - 1}{k} + (m - 1)\psi{(N)} - \dfrac{1}{N} \sum_{i = 1}^N \sum_{j = 1}^m \psi{(\theta_i^j + 1)}\]

source

GaoKannanOhViswanath

CausalityTools.GaoKannanOhViswanathType
GaoKannanOhViswanath <: MutualInformationEstimator
GaoKannanOhViswanath(; k = 1, w = 0)

The GaoKannanOhViswanath (Shannon) estimator is designed for estimating mutual information between variables that may be either discrete, continuous or a mixture of both (Gao et al., 2017).

Explicitly convert your discrete data to floats

Even though the GaoKannanOhViswanath estimator is designed to handle discrete data, our implementation demands that all input data are StateSpaceSets whose data points are floats. If you have discrete data, such as strings or symbols, encode them using integers and convert those integers to floats before passing them to mutualinfo.

Description

The estimator starts by expressing mutual information in terms of the Radon-Nikodym derivative, and then estimates these derivatives using k-nearest neighbor distances from empirical samples.

The estimator avoids the common issue of having to add noise to data before analysis due to tied points, which may bias other estimators. Citing their paper, the estimator "strongly outperforms natural baselines of discretizing the mixed random variables (by quantization) or making it continuous by adding a small Gaussian noise."

Implementation note

In Gao et al. (2017), they claim (roughly speaking) that the estimator reduces to the KraskovStögbauerGrassberger1 estimator for continuous-valued data. However, KraskovStögbauerGrassberger1 uses the digamma function, while GaoKannanOhViswanath uses the logarithm instead, so the estimators are not exactly equivalent for continuous data.

Moreover, in their algorithm 1, it is clearly not the case that the method falls back on the KSG1 approach. The KSG1 estimator uses k-th neighbor distances in the joint space, while the GaoKannanOhViswanath algorithm selects the maximum k-th nearest distances among the two marginal spaces, which are in general not the same as the k-th neighbor distance in the joint space (unless both marginals are univariate). Therefore, our implementation here differs slightly from algorithm 1 in GaoKannanOhViswanath. We have modified it in a way that mimics KraskovStögbauerGrassberger1 for continous data. Note that because of using the log function instead of digamma, there will be slight differences between the methods. See the source code for more details.

See also: mutualinfo.

source

GaoOhViswanath

CausalityTools.GaoOhViswanathType
GaoOhViswanath <: MutualInformationEstimator

The GaoOhViswanath mutual information estimator, also called the bias-improved-KSG estimator, or BI-KSG, by (Gao et al., 2018), is given by

\[\begin{align*} \hat{H}_{GAO}(X, Y) &= \hat{H}_{KSG}(X) + \hat{H}_{KSG}(Y) - \hat{H}_{KZL}(X, Y) \\ &= \psi{(k)} + \log{(N)} + \log{ \left( \dfrac{c_{d_{x}, 2} c_{d_{y}, 2}}{c_{d_{x} + d_{y}, 2}} \right) } - \\ & \dfrac{1}{N} \sum_{i=1}^N \left( \log{(n_{x, i, 2})} + \log{(n_{y, i, 2})} \right) \end{align*},\]

where $c_{d, 2} = \dfrac{\pi^{\frac{d}{2}}}{\Gamma{(\dfrac{d}{2} + 1)}}$ is the volume of a $d$-dimensional unit $\mathcal{l}_2$-ball.

source