Information measures (entropies and co.)
Be sure you have gone through the Tutorial before going through the API here to have a good idea of the terminology used in ComplexityMeasures.jl.
Information measures API
The information measure API is defined by the information
function, which takes as an input an InformationMeasure
, or some specialized DiscreteInfoEstimator
or DifferentialInfoEstimator
for estimating the discrete or differential variant of the measure. The functions information_maximum
and information_normalized
are also useful.
ComplexityMeasures.InformationMeasure
— TypeInformationMeasure
InformationMeasure
is the supertype of all information measure definitions.
In this package, we define "information measures" as functionals of probability mass functions ("discrete" measures), or of probability density functions ("differential" measures). Examples are (generalized) entropies such as Shannon
or Renyi
, or extropies like ShannonExtropy
. Amigó et al. (2018) provides a useful review of generalized entropies.
Used with
Any of the information measures listed below can be used with
information
, to compute a numerical value for the measure, given some input data.information_maximum
, to compute the maximum possible value for the measure.information_normalized
, to compute the normalized form of the measure (divided by the maximum possible value).
The information_maximum
/information_normalized
functions only works with the discrete version of the measure. See docstrings for the above functions for usage examples.
Implementations
Renyi
.Tsallis
.Shannon
, which is a subcase of the above two in the limitq → 1
.Kaniadakis
.Curado
.StretchedExponential
.RenyiExtropy
.TsallisExtropy
.ShannonExtropy
, which is a subcase of the above two in the limitq → 1
.FluctuationComplexity
.
Estimators
A particular information measure may have both a discrete and a continuous/differential definition, which are estimated using a DifferentialInfoEstimator
or a DifferentialInfoEstimator
, respectively.
ComplexityMeasures.information
— Methodinformation([die::DiscreteInfoEstimator,] [est::ProbabilitiesEstimator,] o::OutcomeSpace, x) → h::Real
information(o::OutcomeSpace, x) → h::Real
Estimate a discrete information measure from input data x
using the provided DiscreteInfoEstimator
and ProbabilitiesEstimator
over the given OutcomeSpace
.
As an alternative, you can provide an InformationMeasure
for the first argument (die
) which will default to PlugIn
estimation) for the information estimation. You may also skip the first argument (die
), in which case Shannon()
will be used. You may also skip the second argument (est
), which will default to the RelativeAmount
probabilities estimator. Note that some information measure estimators (e.g., GeneralizedSchuermann
) operate directly on counts and hence ignore est
.
information([e::DiscreteInfoEstimator,] p::Probabilities) → h::Real
information([e::DiscreteInfoEstimator,] c::Counts) → h::Real
Like above, but estimate the information measure from the pre-computed Probabilities
p
or Counts
. Counts are converted into probabilities using RelativeAmount
, unless the estimator e
uses counts directly.
See also: information_maximum
, information_normalized
for a normalized version.
Examples (naive estimation)
The simplest way to estimate a discrete measure is to provide the InformationMeasure
directly in combination with an OutcomeSpace
. This will use the "naive" PlugIn
estimator for the measure, and the "naive" RelativeAmount
estimator for the probabilities.
x = randn(100) # some input data
o = ValueBinning(RectangularBinning(5)) # a 5-bin histogram outcome space
h_s = information(Shannon(), o, x)
Here are some more examples:
x = [rand(Bool) for _ in 1:10000] # coin toss
ps = probabilities(x) # gives about [0.5, 0.5] by definition
h = information(ps) # gives 1, about 1 bit by definition (Shannon entropy by default)
h = information(Shannon(), ps) # syntactically equivalent to the above
h = information(Shannon(), UniqueElements(), x) # syntactically equivalent to above
h = information(Renyi(2.0), ps) # also gives 1, order `q` doesn't matter for coin toss
h = information(OrdinalPatterns(;m=3), x) # gives about 2, again by definition
Examples (bias-corrected estimation)
It is known that both PlugIn
estimation for information measures and RelativeAmount
estimation for probabilities are biased. The scientific literature abounds with estimators that correct for this bias, both on the measure-estimation level and on the probability-estimation level. We thus provide the option to use any DiscreteInfoEstimator
in combination with any ProbabilitiesEstimator
for improved estimates. Note that custom probabilites estimators will only work with counting-compatible OutcomeSpace
.
x = randn(100)
o = ValueBinning(RectangularBinning(5))
# Estimate Shannon entropy estimation using various dedicated estimators
h_s = information(MillerMadow(Shannon()), RelativeAmount(), o, x)
h_s = information(HorvitzThompson(Shannon()), Shrinkage(), o, x)
h_s = information(Schuermann(Shannon()), Shrinkage(), o, x)
# Estimate information measures using the generic `Jackknife` estimator
h_r = information(Jackknife(Renyi()), Shrinkage(), o, x)
j_t = information(Jackknife(TsallisExtropy()), BayesianRegularization(), o, x)
j_r = information(Jackknife(RenyiExtropy()), RelativeAmount(), o, x)
ComplexityMeasures.information
— Methodinformation(est::DifferentialInfoEstimator, x) → h::Real
Estimate a differential information measure using the provided DifferentialInfoEstimator
and input data x
.
Description
The overwhelming majority of differential estimators estimate the Shannon
entropy. If the same estimator can estimate different information measures (e.g. it can estimate both Shannon
and Tsallis
), then the information measure is provided as an argument to the estimator itself.
See the table of differential information measure estimators in the docs for all differential information measure estimators.
Currently, unlike for the discrete information measures, this method doesn't involve explicitly first computing a probability density function and then passing this density to an information measure definition. But in the future, we want to establish a density
API similar to the probabilities
API.
Examples
To compute the differential version of a measure, give it as the first argument to a DifferentialInfoEstimator
and pass it to information
.
x = randn(1000)
h_sh = information(Kraskov(Shannon()), x)
h_vc = information(Vasicek(Shannon()), x)
A normal distribution has a base-e Shannon differential entropy of 0.5*log(2π) + 0.5
nats.
est = Kraskov(k = 5, base = ℯ) # Base `ℯ` for nats.
h = information(est, randn(2_000_000))
abs(h - 0.5*log(2π) - 0.5) # ≈ 0.0001
ComplexityMeasures.information_maximum
— Functioninformation_maximum(e::InformationMeasure, o::OutcomeSpace [, x])
Return the maximum value of the given information measure can have, given input data x
and the given outcome space (the OutcomeSpace
may also be specified by a ProbabilitiesEstimator
).
Like in outcome_space
, for some outcome spaces, the possible outcomes are known without knowledge of input x
, in which case the function dispatches to information_maximum(e, o)
.
information_maximum(e::InformationMeasure, L::Int)
The same as above, but computed directly from the number of total outcomes L
.
ComplexityMeasures.information_normalized
— Functioninformation_normalized([e::DiscreteInfoEstimator,] [est::ProbabilitiesEstimator,] o::OutcomeSpace, x) → h::Real
Estimate the normalized version of the given discrete information measure, This is just the value of information
divided its maximum possible value given o
.
The same convenience syntaxes as in information
can be used here.
Notice that there is no method information_normalized(e::DiscreteInfoEstimator, probs::Probabilities)
, because there is no way to know the number of possible outcomes (i.e., the total_outcomes
) from probs
.
Normalized values
For the PlugIn
estimator, it is guaranteed that h̃ ∈ [0, 1]
. For any other estimator, we can't guarantee this, since the estimator might over-correct. You should know what you're doing if using anything but PlugIn
to estimate normalized values.
Entropies
ComplexityMeasures.entropy
— Functionentropy(args...)
entropy
is nothing more than a call to information
that will simply throw an error if used with an information measure that is not an entropy.
ComplexityMeasures.Shannon
— TypeShannon <: InformationMeasure
Shannon(; base = 2)
The Shannon (Shannon, 1948) entropy, used with information
to compute:
\[H(p) = - \sum_i p[i] \log(p[i])\]
with the $\log$ at the given base
.
The maximum value of the Shannon entropy is $\log_{base}(L)$, which is the entropy of the uniform distribution with $L$ the total_outcomes
.
ComplexityMeasures.Renyi
— TypeRenyi <: InformationMeasure
Renyi(q, base = 2)
Renyi(; q = 1.0, base = 2)
The Rényi generalized order-q
entropy (Rényi, 1961), used with information
to compute an entropy with units given by base
(typically 2
or MathConstants.e
).
Description
Let $p$ be an array of probabilities (summing to 1). Then the Rényi generalized entropy is
\[H_q(p) = \frac{1}{1-q} \log \left(\sum_i p[i]^q\right)\]
and generalizes other known entropies, like e.g. the information entropy ($q = 1$, see Shannon (1948)), the maximum entropy ($q=0$, also known as Hartley entropy), or the correlation entropy ($q = 2$, also known as collision entropy).
The maximum value of the Rényi entropy is $\log_{base}(L)$, which is the entropy of the uniform distribution with $L$ the total_outcomes
.
ComplexityMeasures.Tsallis
— TypeTsallis <: InformationMeasure
Tsallis(q; k = 1.0, base = 2)
Tsallis(; q = 1.0, k = 1.0, base = 2)
The Tsallis generalized order-q
entropy (Tsallis, 1988), used with information
to compute an entropy.
base
only applies in the limiting case q == 1
, in which the Tsallis entropy reduces to Shannon
entropy.
Description
The Tsallis entropy is a generalization of the Boltzmann-Gibbs entropy, with k
standing for the Boltzmann constant. It is defined as
\[S_q(p) = \frac{k}{q - 1}\left(1 - \sum_{i} p[i]^q\right)\]
The maximum value of the Tsallis entropy is $k(L^{1 - q} - 1)/(1 - q)$, with $L$ the total_outcomes
.
ComplexityMeasures.Kaniadakis
— TypeKaniadakis <: InformationMeasure
Kaniadakis(; κ = 1.0, base = 2.0)
The Kaniadakis entropy (Tsallis, 2009), used with information
to compute
\[H_K(p) = -\sum_{i=1}^N p_i f_\kappa(p_i),\]
\[f_\kappa (x) = \dfrac{x^\kappa - x^{-\kappa}}{2\kappa},\]
where if $\kappa = 0$, regular logarithm to the given base
is used, and 0 probabilities are skipped.
ComplexityMeasures.Curado
— TypeCurado <: InformationMeasure
Curado(; b = 1.0)
The Curado entropy (Curado and Nobre, 2004), used with information
to compute
\[H_C(p) = \left( \sum_{i=1}^N e^{-b p_i} \right) + e^{-b} - 1,\]
with b ∈ ℛ, b > 0
, and the terms outside the sum ensures that $H_C(0) = H_C(1) = 0$.
The maximum entropy for Curado
is $L(1 - \exp(-b/L)) + \exp(-b) - 1$ with $L$ the total_outcomes
.
ComplexityMeasures.StretchedExponential
— TypeStretchedExponential <: InformationMeasure
StretchedExponential(; η = 2.0, base = 2)
The stretched exponential, or Anteneodo-Plastino, entropy (Anteneodo and Plastino, 1999), used with information
to compute
\[S_{\eta}(p) = \sum_{i = 1}^N \Gamma \left( \dfrac{\eta + 1}{\eta}, - \log_{base}(p_i) \right) - p_i \Gamma \left( \dfrac{\eta + 1}{\eta} \right),\]
where $\eta \geq 0$, $\Gamma(\cdot, \cdot)$ is the upper incomplete Gamma function, and $\Gamma(\cdot) = \Gamma(\cdot, 0)$ is the Gamma function. Reduces to Shannon
entropy for η = 1.0
.
The maximum entropy for StrechedExponential
is a rather complicated expression involving incomplete Gamma functions (see source code).
Other information measures
ComplexityMeasures.ShannonExtropy
— TypeShannonExtropy <: InformationMeasure
ShannonExtropy(; base = 2)
The Shannon extropy (Lad et al., 2015), used with information
to compute
\[J(x) = -\sum_{i=1}^N (1 - p[i]) \log{(1 - p[i])},\]
for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given base
.
ComplexityMeasures.RenyiExtropy
— TypeRenyiExtropy <: InformationMeasure
RenyiExtropy(; q = 1.0, base = 2)
The Rényi extropy (Liu and Xiao, 2023).
Description
RenyiExtropy
is used with information
to compute
\[J_R(P) = \dfrac{-(n - 1) \log{(n - 1)} + (n - 1) \log{ \left( \sum_{i=1}^N {(1 - p[i])}^q \right)} }{q - 1}\]
for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given base
. Alternatively, RenyiExtropy
can be used with information_normalized
, which ensures that the computed extropy is on the interval $[0, 1]$ by normalizing to to the maximal Rényi extropy, given by
\[J_R(P) = (N - 1)\log \left( \dfrac{n}{n-1} \right) .\]
ComplexityMeasures.TsallisExtropy
— TypeTsallisExtropy <: InformationMeasure
TsallisExtropy(; base = 2)
The Tsallis extropy (Xue and Deng, 2023).
Description
TsallisExtropy
is used with information
to compute
\[J_T(P) = k \dfrac{N - 1 - \sum_{i=1}^N ( 1 - p[i])^q}{q - 1}\]
for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given base
. Alternatively, TsallisExtropy
can be used with information_normalized
, which ensures that the computed extropy is on the interval $[0, 1]$ by normalizing to to the maximal Tsallis extropy, given by
\[J_T(P) = \dfrac{(N - 1)N^{q - 1} - (N - 1)^q}{(q - 1)N^{q - 1}}\]
ComplexityMeasures.ElectronicEntropy
— TypeElectronicEntropy <: InformationMeasure
ElectronicEntropy(; h = Shannon(; base = 2), j = ShannonExtropy(; base = 2))
The "electronic entropy" measure is defined in discrete form in Lad et al. (2015) as
\[H_{EL}(p) = H_S(p) + J_S(P),\]
where $H_S(p)$ is the Shannon
entropy and $J_S(p)$ is the ShannonExtropy
extropy of the probability vector $p$.
ComplexityMeasures.FluctuationComplexity
— TypeFluctuationComplexity <: InformationMeasure
FluctuationComplexity(; definition = Shannon(; base = 2), base = 2)
The "fluctuation complexity" quantifies the standard deviation of the information content of the states $\omega_i$ around some summary statistic (InformationMeasure
) of a PMF. Specifically, given some outcome space $\Omega$ with outcomes $\omega_i \in \Omega$ and a probability mass function $p(\Omega) = \{ p(\omega_i) \}_{i=1}^N$, it is defined as
\[\sigma_I(p) := \sqrt{\sum_{i=1}^N p_i(I_i - H_*)^2}\]
where $I_i = -\log_{base}(p_i)$ is the information content of the i-th outcome. The type of information measure $*$ is controlled by definition
.
The base
controls the base of the logarithm that goes into the information content terms. Make sure that you pick a base
that is consistent with the base chosen for the definition
(relevant for e.g. Shannon
).
Properties
If definition
is the Shannon
entropy, then we recover the Shannon-type information fluctuation complexity from (Bates and Shepard, 1993). Then the fluctuation complexity is zero for PMFs with only a single non-zero element, or for the uniform distribution.
If definition
is not Shannon entropy, then the properties of the measure varies, and does not necessarily share the properties (Bates and Shepard, 1993).
As far as we know, using other information measures besides Shannon entropy for the fluctuation complexity hasn't been explored in the literature yet. Our implementation, however, allows for it. Please inform us if you try some new combinations!
Discrete information estimators
ComplexityMeasures.DiscreteInfoEstimator
— TypeDiscreteInfoEstimator
The supertype of all discrete information measure estimators, which are used in combination with a ProbabilitiesEstimator
as input to information
or related functions.
The first argument to a discrete estimator is always an InformationMeasure
(defaults to Shannon
).
Description
A discrete InformationMeasure
is a functional of a probability mass function. To estimate such a measure from data, we must first estimate a probability mass function using a ProbabilitiesEstimator
from the (encoded/discretized) input data, and then apply the estimator to the estimated probabilities. For example, the Shannon
entropy is typically computed using the RelativeAmount
estimator to compute probabilities, which are then given to the PlugIn
estimator. Many other estimators exist, not only for Shannon
entropy, but other information measures as well.
We provide a library of both generic estimators such as PlugIn
or Jackknife
(which can be applied to any measure), as well as dedicated estimators such as MillerMadow
, which computes Shannon
entropy using the Miller-Madow bias correction. The list below gives a complete overview.
Implementations
The following estimators are generic and can compute any InformationMeasure
.
PlugIn
. The default, generic plug-in estimator of any information measure. It computes the measure exactly as stated in the definition, using the computed probability mass function.Jackknife
. Uses the a combination of the plug-in estimator and the jackknife principle to estimate the information measure.
Shannon
entropy estimators
The following estimators are dedicated Shannon
entropy estimators, which provide improvements over the naive PlugIn
estimator.
Any of the implemented DiscreteInfoEstimator
s can be used in combination with anyProbabilitiesEstimator
as input to information
. What this means is that every estimator actually comes in many different variants - one for each ProbabilitiesEstimator
. For example, the MillerMadow
estimator of Shannon
entropy is typically calculated with RelativeAmount
probabilities. But here, you can use for example the BayesianRegularization
or the Shrinkage
probabilities estimators instead, i.e. information(MillerMadow(), RelativeAmount(outcome_space), x)
and information(MillerMadow(), BayesianRegularization(outcomes_space), x)
are distinct estimators. This holds for all DiscreteInfoEstimator
s. Many of these estimators haven't been explored in the literature before, so feel free to explore, and please cite this software if you use it to explore some new estimator combination!
ComplexityMeasures.PlugIn
— TypePlugIn(e::InformationMeasure) <: DiscreteInfoEstimatorGeneric
The PlugIn
estimator is also called the empirical/naive/"maximum likelihood" estimator, and is used with information
to any discrete InformationMeasure
.
It computes any quantity exactly as given by its formula. When computing an information measure, which here is defined as a probabilities functional, it computes the quantity directly from a probability mass function, which is derived from maximum-likelihood (RelativeAmount
estimates of the probabilities.
Bias of plug-in estimates
The plugin-estimator of Shannon
entropy underestimates the true entropy, with a bias that grows with the number of distinct outcomes
(Arora et al., 2022)(Arora et al., 2022),
\[bias(H_S^{plugin}) = -\dfrac{K-1}{2N} + o(N^-1).\]
where K
is the number of distinct outcomes, and N
is the sample size. Many authors have tried to remedy this by proposing alternative Shannon entropy estimators. For example, the MillerMadow
estimator is a simple correction to the plug-in estimator that adds back the bias term above. Many other estimators exist; see DiscreteInfoEstimator
s for an overview.
ComplexityMeasures.MillerMadow
— TypeMillerMadow <: DiscreteInfoEstimatorShannon
MillerMadow(measure::Shannon = Shannon())
The MillerMadow
estimator is used with information
to compute the discrete Shannon
entropy according to Miller (1955).
Description
The Miller-Madow estimator of Shannon entropy is given by
\[H_S^{MM} = H_S^{plugin} + \dfrac{m - 1}{2N},\]
where $H_S^{plugin}$ is the Shannon entropy estimated using the PlugIn
estimator, m
is the number of bins with nonzero probability (as defined in Paninski (2003)), and N
is the number of observations.
ComplexityMeasures.Schuermann
— TypeSchuermann <: DiscreteInfoEstimatorShannon
Schuermann(definition::Shannon; a = 1.0)
The Schuermann
estimator is used with information
to compute the discrete Shannon
entropy with the bias-corrected estimator given in Schürmann (2004).
See detailed description for GeneralizedSchuermann
for details.
ComplexityMeasures.GeneralizedSchuermann
— TypeGeneralizedSchuermann <: DiscreteInfoEstimatorShannon
GeneralizedSchuermann(definition = Shannon(); a = 1.0)
The GeneralizedSchuermann
estimator is used with information
to compute the discrete Shannon
entropy with the bias-corrected estimator given in Grassberger (2022).
The "generalized" part of the name, as opposed to the Schürmann (2004) estimator (Schuermann
), is due to the possibility of picking difference parameters $a_i$ for different outcomes. If different parameters are assigned to the different outcomes, a
must be a vector of parameters of length length(outcomes)
, where the outcomes are obtained using outcomes
. See Grassberger (2022) for more information. If a
is a real number, then $a_i = a \forall i$, and the estimator reduces to the Schuermann
estimator.
Description
For a set of $N$ observations over $M$ outcomes, the estimator is given by
\[H_S^{opt} = \varphi(N) - \dfrac{1}{N} \sum_{i=1}^M n_i G_{n_i}(a_i),\]
where $n_i$ is the observed frequency of the i-th outcome,
\[G_n(a) = \varphi(n) + (-1)^n \int_0^a \dfrac{x^{n - 1}}{x + 1} dx,\]
$G_n(1) = G_n$ and $G_n(0) = \varphi(n)$, and
\[G_n = \varphi(n) + (-1)^n \int_0^1 \dfrac{x^{n - 1}}{x + 1} dx.\]
ComplexityMeasures.Jackknife
— TypeJackknife <: DiscreteInfoEstimatorGeneric
Jackknife(definition::InformationMeasure = Shannon())
The Jackknife
estimator is used with information
to compute any discrete InformationMeasure
.
The Jackknife
estimator uses the generic jackknife principle to reduce bias. Zahl (1977) was the first to apply the jaccknife technique in the context of Shannon
entropy estimation. Here, we've generalized his estimator to work with any InformationMeasure
.
Description
As an example of the jackknife technique, here is the formula for a jackknife estimate of Shannon
entropy
\[H_S^{J} = N H_S^{plugin} - \dfrac{N-1}{N} \sum_{i=1}^N {H_S^{plugin}}^{-\{i\}},\]
where $N$ is the sample size, $H_S^{plugin}$ is the plugin estimate of Shannon entropy, and ${H_S^{plugin}}^{-\{i\}}$ is the plugin estimate, but computed with the $i$-th sample left out.
ComplexityMeasures.HorvitzThompson
— TypeHorvitzThompson <: DiscreteInfoEstimatorShannon
HorvitzThompson(measure::Shannon = Shannon())
The HorvitzThompson
estimator is used with information
to compute the discrete Shannon
entropy according to Horvitz and Thompson (1952).
Description
The Horvitz-Thompson estimator of Shannon
entropy is given by
\[H_S^{HT} = -\sum_{i=1}^M \dfrac{p_i \log(p_i) }{1 - (1 - p_i)^N},\]
where $N$ is the sample size and $M$ is the number of outcomes
. Given the true probability $p_i$ of the $i$-th outcome, $1 - (1 - p_i)^N$ is the probability that the outcome appears at least once in a sample of size $N$ (Arora et al., 2022). Dividing by this inclusion probability is a form of weighting, and compensates for situations where certain outcomes have so low probabilities that they are not often observed in a sample, for example in power-law distributions.
ComplexityMeasures.ChaoShen
— TypeChaoShen <: DiscreteInfoEstimatorShannon
ChaoShen(definition::Shannon = Shannon())
The ChaoShen
estimator is used with information
to compute the discrete Shannon
entropy according to Chao and Shen (2003).
Description
This estimator is a modification of the HorvitzThompson
estimator that multiplies each plugin probability estimate by an estimate of sample coverage. If $f_1$ is the number of singletons (outcomes that occur only once) in a sample of length $N$, then the sample coverage is $C = 1 - \dfrac{f_1}{N}$. The Chao-Shen estimator of Shannon entropy is then
\[H_S^{CS} = -\sum_{i=1}^M \left( \dfrac{C p_i \log(C p_i)}{1 - (1 - C p_i)^N} \right),\]
where $N$ is the sample size and $M$ is the number of outcomes
. If $f_1 = N$, then $f_1$ is set to $f_1 = N - 1$ to ensure positive entropy (Arora et al., 2022).
Differential information estimators
ComplexityMeasures.DifferentialInfoEstimator
— TypeDifferentialInfoEstimator
The supertype of all differential information measure estimators. These estimators compute an information measure in various ways that do not involve explicitly estimating a probability distribution.
Each DifferentialInfoEstimator
s uses a specialized technique to approximate relevant densities/integrals, and is often tailored to one or a few types of information measures. For example, Kraskov
estimates the Shannon
entropy.
See information
for usage.
Implementations
ComplexityMeasures.Kraskov
— TypeKraskov <: DifferentialInfoEstimator
Kraskov(definition = Shannon(); k::Int = 1, w::Int = 0)
The Kraskov
estimator computes the Shannon
differential information
of a multi-dimensional StateSpaceSet
using the k
-th nearest neighbor searches method from Kraskov et al. (2004), with logarithms to the base
specified in definition
.
w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
Description
Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. Kraskov
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]
See also: information
, KozachenkoLeonenko
, DifferentialInfoEstimator
.
ComplexityMeasures.KozachenkoLeonenko
— TypeKozachenkoLeonenko <: DifferentialInfoEstimator
KozachenkoLeonenko(definition = Shannon(); w::Int = 0)
The KozachenkoLeonenko
estimator (Kozachenko and Leonenko, 1987) computes the Shannon
differential information
of a multi-dimensional StateSpaceSet
, with logarithms to the base
specified in definition
.
Description
Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. KozachenkoLeonenko
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))]\]
using the nearest neighbor method from Kozachenko and Leonenko (1987), as described in Charzyńska and Gambin (2016).
w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
In contrast to Kraskov
, this estimator uses only the closest neighbor.
See also: information
, Kraskov
, DifferentialInfoEstimator
.
ComplexityMeasures.Zhu
— TypeZhu <: DifferentialInfoEstimator
Zhu(; definition = Shannon(), k = 1, w = 0)
The Zhu
estimator (Zhu et al., 2015) is an extension to KozachenkoLeonenko
, and computes the Shannon
differential information
of a multi-dimensional StateSpaceSet
, with logarithms to the base
specified in definition
.
Description
Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. Zhu
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))]\]
by approximating densities within hyperrectangles surrounding each point xᵢ ∈ x
using using k
nearest neighbor searches. w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
See also: information
, KozachenkoLeonenko
, DifferentialInfoEstimator
.
ComplexityMeasures.ZhuSingh
— TypeZhuSingh <: DifferentialInfoEstimator
ZhuSingh(definition = Shannon(); k = 1, w = 0)
The ZhuSingh
estimator (Zhu et al., 2015) computes the Shannon
differential information
of a multi-dimensional StateSpaceSet
, with logarithms to the base
specified in definition
.
Description
Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. ZhuSingh
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]
Like Zhu
, this estimator approximates probabilities within hyperrectangles surrounding each point xᵢ ∈ x
using using k
nearest neighbor searches. However, it also considers the number of neighbors falling on the borders of these hyperrectangles. This estimator is an extension to the entropy estimator in Singh et al. (2003).
w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
See also: information
, DifferentialInfoEstimator
.
ComplexityMeasures.Gao
— TypeGao <: DifferentialInfoEstimator
Gao(definition = Shannon(); k = 1, w = 0, corrected = true)
The Gao
estimator (Gao et al., 09–12 May 2015) computes the Shannon
differential information
, using a k
-th nearest-neighbor approach based on Singh et al. (2003), with logarithms to the base
specified in definition
.
w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
Gao et al. (09–12 May 2015) give two variants of this estimator. If corrected == false
, then the uncorrected version is used. If corrected == true
, then the corrected version is used, which ensures that the estimator is asymptotically unbiased.
Description
Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. KozachenkoLeonenko
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]
ComplexityMeasures.Goria
— TypeGoria <: DifferentialInfoEstimator
Goria(measure = Shannon(); k = 1, w = 0)
The Goria
estimator (Goria et al., 2005) computes the Shannon
differential information
of a multi-dimensional StateSpaceSet
, with logarithms to the base
specified in definition
.
Description
Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. Goria
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]
Specifically, let $\bf{n}_1, \bf{n}_2, \ldots, \bf{n}_N$ be the distance of the samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ to their k
-th nearest neighbors. Next, let the geometric mean of the distances be
\[\hat{\rho}_k = \left( \prod_{i=1}^N \right)^{\dfrac{1}{N}}\]
Goria et al. (2005)'s estimate of Shannon differential entropy is then
\[\hat{H} = m\hat{\rho}_k + \log(N - 1) - \psi(k) + \log c_1(m),\]
where $c_1(m) = \dfrac{2\pi^\frac{m}{2}}{m \Gamma(m/2)}$ and $\psi$ is the digamma function.
ComplexityMeasures.Lord
— TypeLord <: DifferentialInfoEstimator
Lord(measure = Shannon(); k = 10, w = 0)
The Lord
estimator (Lord et al., 2018) estimates the Shannon
differential information
using a nearest neighbor approach with a local nonuniformity correction (LNC), with logarithms to the base
specified in definition
.
w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
Description
Assume we have samples $\bar{X} = \{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function $f : \mathbb{R}^d \to \mathbb{R}$. Lord
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))],\]
by using the resubstitution formula
\[\hat{\bar{X}, k} = -\mathbb{E}[\log(f(X))] \approx \sum_{i = 1}^N \log(\hat{f}(\bf{x}_i)),\]
where $\hat{f}(\bf{x}_i)$ is an estimate of the density at $\bf{x}_i$ constructed in a manner such that $\hat{f}(\bf{x}_i) \propto \dfrac{k(x_i) / N}{V_i}$, where $k(x_i)$ is the number of points in the neighborhood of $\bf{x}_i$, and $V_i$ is the volume of that neighborhood.
While most nearest-neighbor based differential entropy estimators uses regular volume elements (e.g. hypercubes, hyperrectangles, hyperspheres) for approximating the local densities $\hat{f}(\bf{x}_i)$, the Lord
estimator uses hyperellopsoid volume elements. These hyperellipsoids are, for each query point xᵢ
, estimated using singular value decomposition (SVD) on the k
-th nearest neighbors of xᵢ
. Thus, the hyperellipsoids stretch/compress in response to the local geometry around each sample point. This makes Lord
a well-suited entropy estimator for a wide range of systems.
ComplexityMeasures.LeonenkoProzantoSavani
— TypeLeonenkoProzantoSavani <: DifferentialInfoEstimator
LeonenkoProzantoSavani(definition = Shannon(); k = 1, w = 0)
The LeonenkoProzantoSavani
estimator (Leonenko et al., 2008) computes the Shannon
, Renyi
, or Tsallis
differential information
of a multi-dimensional StateSpaceSet
, with logarithms to the base
specified in definition
.
Description
The estimator uses k
-th nearest-neighbor searches. w
is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to 0
, meaning that only the point itself is excluded when searching for neighbours).
For details, see Leonenko et al. (2008).
ComplexityMeasures.Vasicek
— TypeVasicek <: DifferentialInfoEstimator
Vasicek(definition = Shannon(); m::Int = 1)
The Vasicek
estimator computes the Shannon
differential information
of a timeseries using the method from Vasicek (1976), with logarithms to the base
specified in definition
.
The Vasicek
estimator belongs to a class of differential entropy estimators based on order statistics, of which Vasicek (1976) was the first. It only works for timeseries input.
Description
Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. Vasicek
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]
However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$,
\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]
This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The Vasicek
Shannon
differential entropy estimate is then
\[\hat{H}_V(\bar{X}, m) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{n}{2m} (\bar{X}_{(i+m)} - \bar{X}_{(i-m)}) \right]\]
Usage
In practice, choice of m
influences how fast the entropy converges to the true value. For small value of m
, convergence is slow, so we recommend to scale m
according to the time series length n
and use m >= n/100
(this is just a heuristic based on the tests written for this package).
See also: information
, Correa
, AlizadehArghami
, Ebrahimi
, DifferentialInfoEstimator
.
ComplexityMeasures.AlizadehArghami
— TypeAlizadehArghami <: DifferentialInfoEstimator
AlizadehArghami(definition = Shannon(); m::Int = 1)
The AlizadehArghami
estimator computes the Shannon
differential information
of a timeseries using the method from Alizadeh and Arghami (2010), with logarithms to the base
specified in definition
.
The AlizadehArghami
estimator belongs to a class of differential entropy estimators based on order statistics. It only works for timeseries input.
Description
Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. AlizadehArghami
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]
However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$:
\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp.\]
This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The AlizadehArghami
Shannon
differential entropy estimate is then the the Vasicek
estimate $\hat{H}_{V}(\bar{X}, m, n)$, plus a correction factor
\[\hat{H}_{A}(\bar{X}, m, n) = \hat{H}_{V}(\bar{X}, m, n) + \dfrac{2}{n}\left(m \log(2) \right).\]
See also: information
, Correa
, Ebrahimi
, Vasicek
, DifferentialInfoEstimator
.
ComplexityMeasures.Ebrahimi
— TypeEbrahimi <: DifferentialInfoEstimator
Ebrahimi(definition = Shannon(); m::Int = 1)
The Ebrahimi
estimator computes the Shannon
information
of a timeseries using the method from Ebrahimi et al. (1994), with logarithms to the base
specified in definition
.
The Ebrahimi
estimator belongs to a class of differential entropy estimators based on order statistics. It only works for timeseries input.
Description
Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. Ebrahimi
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]
However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$,
\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]
This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The Ebrahimi
Shannon
differential entropy estimate is then
\[\hat{H}_{E}(\bar{X}, m) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{n}{c_i m} (\bar{X}_{(i+m)} - \bar{X}_{(i-m)}) \right],\]
where
\[c_i = \begin{cases} 1 + \frac{i - 1}{m}, & 1 \geq i \geq m \\ 2, & m + 1 \geq i \geq n - m \\ 1 + \frac{n - i}{m} & n - m + 1 \geq i \geq n \end{cases}.\]
See also: information
, Correa
, AlizadehArghami
, Vasicek
, DifferentialInfoEstimator
.
ComplexityMeasures.Correa
— TypeCorrea <: DifferentialInfoEstimator
Correa(definition = Shannon(); m::Int = 1)
The Correa
estimator computes the Shannon
differential information
of a timeseries using the method from Correa (1995), with logarithms to the base
specified in definition
.
The Correa
estimator belongs to a class of differential entropy estimators based on order statistics. It only works for timeseries input.
Description
Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. Correa
estimates the Shannon
differential entropy
\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]
However, instead of estimating the above integral directly, Correa
makes use of the equivalent integral, where $F$ is the distribution function for $X$,
\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]
This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$, ensuring that end points are included. The Correa
estimate of Shannon
differential entropy is then
\[H_C(\bar{X}, m, n) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{ \sum_{j=i-m}^{i+m}(\bar{X}_{(j)} - \tilde{X}_{(i)})(j - i)}{n \sum_{j=i-m}^{i+m} (\bar{X}_{(j)} - \tilde{X}_{(i)})^2} \right],\]
where
\[\tilde{X}_{(i)} = \dfrac{1}{2m + 1} \sum_{j = i - m}^{i + m} X_{(j)}.\]
See also: information
, AlizadehArghami
, Ebrahimi
, Vasicek
, DifferentialInfoEstimator
.
Table of differential information measure estimators
The following estimators are differential information measure estimators, and can also be used with information
.
Each DifferentialInfoEstimator
s uses a specialized technique to approximate relevant densities/integrals, and is often tailored to one or a few types of information measures. For example, Kraskov
estimates the Shannon
entropy.
Estimator | Principle | Input data | Shannon | Renyi | Tsallis | Kaniadakis | Curado | StretchedExponential |
---|---|---|---|---|---|---|---|---|
KozachenkoLeonenko | Nearest neighbors | StateSpaceSet | ✓ | x | x | x | x | x |
Kraskov | Nearest neighbors | StateSpaceSet | ✓ | x | x | x | x | x |
Zhu | Nearest neighbors | StateSpaceSet | ✓ | x | x | x | x | x |
ZhuSingh | Nearest neighbors | StateSpaceSet | ✓ | x | x | x | x | x |
Gao | Nearest neighbors | StateSpaceSet | ✓ | x | x | x | x | x |
Goria | Nearest neighbors | StateSpaceSet | ✓ | x | x | x | x | x |
Lord | Nearest neighbors | StateSpaceSet | ✓ | x | x | x | x | x |
Vasicek | Order statistics | Vector | ✓ | x | x | x | x | x |
Ebrahimi | Order statistics | Vector | ✓ | x | x | x | x | x |
Correa | Order statistics | Vector | ✓ | x | x | x | x | x |
AlizadehArghami | Order statistics | Vector | ✓ | x | x | x | x | x |