# Information measures (entropies and co.)

Be sure you have gone through the Tutorial before going through the API here to have a good idea of the terminology used in ComplexityMeasures.jl.

## Information measures API

The information measure API is defined by the `information`

function, which takes as an input an `InformationMeasure`

, or some specialized `DiscreteInfoEstimator`

or `DifferentialInfoEstimator`

for estimating the discrete or differential variant of the measure. The functions `information_maximum`

and `information_normalized`

are also useful.

`ComplexityMeasures.InformationMeasure`

— Type`InformationMeasure`

`InformationMeasure`

is the supertype of all information measure *definitions*.

In this package, we define "information measures" as functionals of probability mass functions ("discrete" measures), or of probability density functions ("differential" measures). Examples are (generalized) entropies such as `Shannon`

or `Renyi`

, or extropies like `ShannonExtropy`

. Amigó *et al.* (2018) provides a useful review of generalized entropies.

**Used with**

Any of the information measures listed below can be used with

`information`

, to compute a numerical value for the measure, given some input data.`information_maximum`

, to compute the maximum possible value for the measure.`information_normalized`

, to compute the normalized form of the measure (divided by the maximum possible value).

The `information_maximum`

/`information_normalized`

functions only works with the discrete version of the measure. See docstrings for the above functions for usage examples.

**Implementations**

`Renyi`

.`Tsallis`

.`Shannon`

, which is a subcase of the above two in the limit`q → 1`

.`Kaniadakis`

.`Curado`

.`StretchedExponential`

.`RenyiExtropy`

.`TsallisExtropy`

.`ShannonExtropy`

, which is a subcase of the above two in the limit`q → 1`

.`FluctuationComplexity`

.

**Estimators**

A particular information measure may have both a discrete and a continuous/differential definition, which are estimated using a `DifferentialInfoEstimator`

or a `DifferentialInfoEstimator`

, respectively.

`ComplexityMeasures.information`

— Method```
information([die::DiscreteInfoEstimator,] [est::ProbabilitiesEstimator,] o::OutcomeSpace, x) → h::Real
information(o::OutcomeSpace, x) → h::Real
```

Estimate a discrete information measure from input data `x`

using the provided `DiscreteInfoEstimator`

and `ProbabilitiesEstimator`

over the given `OutcomeSpace`

.

As an alternative, you can provide an `InformationMeasure`

for the first argument (`die`

) which will default to `PlugIn`

estimation) for the information estimation. You may also skip the first argument (`die`

), in which case `Shannon()`

will be used. You may also skip the second argument (`est`

), which will default to the `RelativeAmount`

probabilities estimator. Note that some information measure estimators (e.g., `GeneralizedSchuermann`

) operate directly on counts and hence ignore `est`

.

```
information([e::DiscreteInfoEstimator,] p::Probabilities) → h::Real
information([e::DiscreteInfoEstimator,] c::Counts) → h::Real
```

Like above, but estimate the information measure from the pre-computed `Probabilities`

`p`

or `Counts`

. Counts are converted into probabilities using `RelativeAmount`

, unless the estimator `e`

uses counts directly.

See also: `information_maximum`

, `information_normalized`

for a normalized version.

**Examples (naive estimation)**

The simplest way to estimate a discrete measure is to provide the `InformationMeasure`

directly in combination with an `OutcomeSpace`

. This will use the "naive" `PlugIn`

estimator for the measure, and the "naive" `RelativeAmount`

estimator for the probabilities.

```
x = randn(100) # some input data
o = ValueBinning(RectangularBinning(5)) # a 5-bin histogram outcome space
h_s = information(Shannon(), o, x)
```

Here are some more examples:

```
x = [rand(Bool) for _ in 1:10000] # coin toss
ps = probabilities(x) # gives about [0.5, 0.5] by definition
h = information(ps) # gives 1, about 1 bit by definition (Shannon entropy by default)
h = information(Shannon(), ps) # syntactically equivalent to the above
h = information(Shannon(), UniqueElements(), x) # syntactically equivalent to above
h = information(Renyi(2.0), ps) # also gives 1, order `q` doesn't matter for coin toss
h = information(OrdinalPatterns(;m=3), x) # gives about 2, again by definition
```

**Examples (bias-corrected estimation)**

It is known that both `PlugIn`

estimation for information measures and `RelativeAmount`

estimation for probabilities are biased. The scientific literature abounds with estimators that correct for this bias, both on the measure-estimation level and on the probability-estimation level. We thus provide the option to use any `DiscreteInfoEstimator`

in combination with any `ProbabilitiesEstimator`

for improved estimates. Note that custom probabilites estimators will only work with counting-compatible `OutcomeSpace`

.

```
x = randn(100)
o = ValueBinning(RectangularBinning(5))
# Estimate Shannon entropy estimation using various dedicated estimators
h_s = information(MillerMadow(Shannon()), RelativeAmount(), o, x)
h_s = information(HorvitzThompson(Shannon()), Shrinkage(), o, x)
h_s = information(Schuermann(Shannon()), Shrinkage(), o, x)
# Estimate information measures using the generic `Jackknife` estimator
h_r = information(Jackknife(Renyi()), Shrinkage(), o, x)
j_t = information(Jackknife(TsallisExtropy()), BayesianRegularization(), o, x)
j_r = information(Jackknife(RenyiExtropy()), RelativeAmount(), o, x)
```

`ComplexityMeasures.information`

— Method`information(est::DifferentialInfoEstimator, x) → h::Real`

Estimate a **differential information measure** using the provided `DifferentialInfoEstimator`

and input data `x`

.

**Description**

The overwhelming majority of differential estimators estimate the `Shannon`

entropy. If the same estimator can estimate different information measures (e.g. it can estimate both `Shannon`

and `Tsallis`

), then the information measure is provided as an argument to the estimator itself.

See the table of differential information measure estimators in the docs for all differential information measure estimators.

Currently, unlike for the discrete information measures, this method doesn't involve explicitly first computing a probability density function and then passing this density to an information measure definition. But in the future, we want to establish a `density`

API similar to the `probabilities`

API.

**Examples**

To compute the differential version of a measure, give it as the first argument to a `DifferentialInfoEstimator`

and pass it to `information`

.

```
x = randn(1000)
h_sh = information(Kraskov(Shannon()), x)
h_vc = information(Vasicek(Shannon()), x)
```

A normal distribution has a base-e Shannon differential entropy of `0.5*log(2π) + 0.5`

nats.

```
est = Kraskov(k = 5, base = ℯ) # Base `ℯ` for nats.
h = information(est, randn(2_000_000))
abs(h - 0.5*log(2π) - 0.5) # ≈ 0.0001
```

`ComplexityMeasures.information_maximum`

— Function`information_maximum(e::InformationMeasure, o::OutcomeSpace [, x])`

Return the maximum value of the given information measure can have, given input data `x`

and the given outcome space (the `OutcomeSpace`

may also be specified by a `ProbabilitiesEstimator`

).

Like in `outcome_space`

, for some outcome spaces, the possible outcomes are known without knowledge of input `x`

, in which case the function dispatches to `information_maximum(e, o)`

.

`information_maximum(e::InformationMeasure, L::Int)`

The same as above, but computed directly from the number of total outcomes `L`

.

`ComplexityMeasures.information_normalized`

— Function`information_normalized([e::DiscreteInfoEstimator,] [est::ProbabilitiesEstimator,] o::OutcomeSpace, x) → h::Real`

Estimate the normalized version of the given discrete information measure, This is just the value of `information`

divided its maximum possible value given `o`

.

The same convenience syntaxes as in `information`

can be used here.

Notice that there is no method `information_normalized(e::DiscreteInfoEstimator, probs::Probabilities)`

, because there is no way to know the number of *possible* outcomes (i.e., the `total_outcomes`

) from `probs`

.

**Normalized values**

For the `PlugIn`

estimator, it is guaranteed that `h̃ ∈ [0, 1]`

. For any other estimator, we can't guarantee this, since the estimator might over-correct. You should know what you're doing if using anything but `PlugIn`

to estimate normalized values.

## Entropies

`ComplexityMeasures.entropy`

— Function`entropy(args...)`

`entropy`

is nothing more than a call to `information`

that will simply throw an error if used with an information measure that is not an entropy.

`ComplexityMeasures.Shannon`

— Type```
Shannon <: InformationMeasure
Shannon(; base = 2)
```

The Shannon (Shannon, 1948) entropy, used with `information`

to compute:

\[H(p) = - \sum_i p[i] \log(p[i])\]

with the $\log$ at the given `base`

.

The maximum value of the Shannon entropy is $\log_{base}(L)$, which is the entropy of the uniform distribution with $L$ the `total_outcomes`

.

`ComplexityMeasures.Renyi`

— Type```
Renyi <: InformationMeasure
Renyi(q, base = 2)
Renyi(; q = 1.0, base = 2)
```

The Rényi generalized order-`q`

entropy (Rényi, 1961), used with `information`

to compute an entropy with units given by `base`

(typically `2`

or `MathConstants.e`

).

**Description**

Let $p$ be an array of probabilities (summing to 1). Then the Rényi generalized entropy is

\[H_q(p) = \frac{1}{1-q} \log \left(\sum_i p[i]^q\right)\]

and generalizes other known entropies, like e.g. the information entropy ($q = 1$, see Shannon (1948)), the maximum entropy ($q=0$, also known as Hartley entropy), or the correlation entropy ($q = 2$, also known as collision entropy).

The maximum value of the Rényi entropy is $\log_{base}(L)$, which is the entropy of the uniform distribution with $L$ the `total_outcomes`

.

`ComplexityMeasures.Tsallis`

— Type```
Tsallis <: InformationMeasure
Tsallis(q; k = 1.0, base = 2)
Tsallis(; q = 1.0, k = 1.0, base = 2)
```

The Tsallis generalized order-`q`

entropy (Tsallis, 1988), used with `information`

to compute an entropy.

`base`

only applies in the limiting case `q == 1`

, in which the Tsallis entropy reduces to `Shannon`

entropy.

**Description**

The Tsallis entropy is a generalization of the Boltzmann-Gibbs entropy, with `k`

standing for the Boltzmann constant. It is defined as

\[S_q(p) = \frac{k}{q - 1}\left(1 - \sum_{i} p[i]^q\right)\]

The maximum value of the Tsallis entropy is $k(L^{1 - q} - 1)/(1 - q)$, with $L$ the `total_outcomes`

.

`ComplexityMeasures.Kaniadakis`

— Type```
Kaniadakis <: InformationMeasure
Kaniadakis(; κ = 1.0, base = 2.0)
```

The Kaniadakis entropy (Tsallis, 2009), used with `information`

to compute

\[H_K(p) = -\sum_{i=1}^N p_i f_\kappa(p_i),\]

\[f_\kappa (x) = \dfrac{x^\kappa - x^{-\kappa}}{2\kappa},\]

where if $\kappa = 0$, regular logarithm to the given `base`

is used, and 0 probabilities are skipped.

`ComplexityMeasures.Curado`

— Type```
Curado <: InformationMeasure
Curado(; b = 1.0)
```

The Curado entropy (Curado and Nobre, 2004), used with `information`

to compute

\[H_C(p) = \left( \sum_{i=1}^N e^{-b p_i} \right) + e^{-b} - 1,\]

with `b ∈ ℛ, b > 0`

, and the terms outside the sum ensures that $H_C(0) = H_C(1) = 0$.

The maximum entropy for `Curado`

is $L(1 - \exp(-b/L)) + \exp(-b) - 1$ with $L$ the `total_outcomes`

.

`ComplexityMeasures.StretchedExponential`

— Type```
StretchedExponential <: InformationMeasure
StretchedExponential(; η = 2.0, base = 2)
```

The stretched exponential, or Anteneodo-Plastino, entropy (Anteneodo and Plastino, 1999), used with `information`

to compute

\[S_{\eta}(p) = \sum_{i = 1}^N \Gamma \left( \dfrac{\eta + 1}{\eta}, - \log_{base}(p_i) \right) - p_i \Gamma \left( \dfrac{\eta + 1}{\eta} \right),\]

where $\eta \geq 0$, $\Gamma(\cdot, \cdot)$ is the upper incomplete Gamma function, and $\Gamma(\cdot) = \Gamma(\cdot, 0)$ is the Gamma function. Reduces to `Shannon`

entropy for `η = 1.0`

.

The maximum entropy for `StrechedExponential`

is a rather complicated expression involving incomplete Gamma functions (see source code).

## Other information measures

`ComplexityMeasures.ShannonExtropy`

— Type```
ShannonExtropy <: InformationMeasure
ShannonExtropy(; base = 2)
```

The Shannon extropy (Lad *et al.*, 2015), used with `information`

to compute

\[J(x) = -\sum_{i=1}^N (1 - p[i]) \log{(1 - p[i])},\]

for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given `base`

.

`ComplexityMeasures.RenyiExtropy`

— Type```
RenyiExtropy <: InformationMeasure
RenyiExtropy(; q = 1.0, base = 2)
```

The Rényi extropy (Liu and Xiao, 2023).

**Description**

`RenyiExtropy`

is used with `information`

to compute

\[J_R(P) = \dfrac{-(n - 1) \log{(n - 1)} + (n - 1) \log{ \left( \sum_{i=1}^N {(1 - p[i])}^q \right)} }{q - 1}\]

for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given `base`

. Alternatively, `RenyiExtropy`

can be used with `information_normalized`

, which ensures that the computed extropy is on the interval $[0, 1]$ by normalizing to to the maximal Rényi extropy, given by

\[J_R(P) = (N - 1)\log \left( \dfrac{n}{n-1} \right) .\]

`ComplexityMeasures.TsallisExtropy`

— Type```
TsallisExtropy <: InformationMeasure
TsallisExtropy(; base = 2)
```

The Tsallis extropy (Xue and Deng, 2023).

**Description**

`TsallisExtropy`

is used with `information`

to compute

\[J_T(P) = k \dfrac{N - 1 - \sum_{i=1}^N ( 1 - p[i])^q}{q - 1}\]

for a probability distribution $P = \{p_1, p_2, \ldots, p_N\}$, with the $\log$ at the given `base`

. Alternatively, `TsallisExtropy`

can be used with `information_normalized`

, which ensures that the computed extropy is on the interval $[0, 1]$ by normalizing to to the maximal Tsallis extropy, given by

\[J_T(P) = \dfrac{(N - 1)N^{q - 1} - (N - 1)^q}{(q - 1)N^{q - 1}}\]

`ComplexityMeasures.ElectronicEntropy`

— Type```
ElectronicEntropy <: InformationMeasure
ElectronicEntropy(; h = Shannon(; base = 2), j = ShannonExtropy(; base = 2))
```

The "electronic entropy" measure is defined in discrete form in Lad *et al.* (2015) as

\[H_{EL}(p) = H_S(p) + J_S(P),\]

where $H_S(p)$ is the `Shannon`

entropy and $J_S(p)$ is the `ShannonExtropy`

extropy of the probability vector $p$.

`ComplexityMeasures.FluctuationComplexity`

— Type```
FluctuationComplexity <: InformationMeasure
FluctuationComplexity(; definition = Shannon(; base = 2), base = 2)
```

The "fluctuation complexity" quantifies the standard deviation of the information content of the states $\omega_i$ around some summary statistic (`InformationMeasure`

) of a PMF. Specifically, given some outcome space $\Omega$ with outcomes $\omega_i \in \Omega$ and a probability mass function $p(\Omega) = \{ p(\omega_i) \}_{i=1}^N$, it is defined as

\[\sigma_I(p) := \sqrt{\sum_{i=1}^N p_i(I_i - H_*)^2}\]

where $I_i = -\log_{base}(p_i)$ is the information content of the i-th outcome. The type of information measure $*$ is controlled by `definition`

.

The `base`

controls the base of the logarithm that goes into the information content terms. Make sure that you pick a `base`

that is consistent with the base chosen for the `definition`

(relevant for e.g. `Shannon`

).

**Properties**

If `definition`

is the `Shannon`

entropy, then we recover the Shannon-type information fluctuation complexity from (Bates and Shepard, 1993). Then the fluctuation complexity is zero for PMFs with only a single non-zero element, or for the uniform distribution.

If `definition`

is not Shannon entropy, then the properties of the measure varies, and does not necessarily share the properties (Bates and Shepard, 1993).

As far as we know, using other information measures besides Shannon entropy for the fluctuation complexity hasn't been explored in the literature yet. Our implementation, however, allows for it. Please inform us if you try some new combinations!

## Discrete information estimators

`ComplexityMeasures.DiscreteInfoEstimator`

— Type`DiscreteInfoEstimator`

The supertype of all discrete information measure estimators, which are used in combination with a `ProbabilitiesEstimator`

as input to `information`

or related functions.

The first argument to a discrete estimator is always an `InformationMeasure`

(defaults to `Shannon`

).

**Description**

A discrete `InformationMeasure`

is a functional of a probability mass function. To estimate such a measure from data, we must first estimate a probability mass function using a `ProbabilitiesEstimator`

from the (encoded/discretized) input data, and then apply the estimator to the estimated probabilities. For example, the `Shannon`

entropy is typically computed using the `RelativeAmount`

estimator to compute probabilities, which are then given to the `PlugIn`

estimator. Many other estimators exist, not only for `Shannon`

entropy, but other information measures as well.

We provide a library of both generic estimators such as `PlugIn`

or `Jackknife`

(which can be applied to any measure), as well as dedicated estimators such as `MillerMadow`

, which computes `Shannon`

entropy using the Miller-Madow bias correction. The list below gives a complete overview.

**Implementations**

The following estimators are generic and can compute any `InformationMeasure`

.

`PlugIn`

. The default, generic plug-in estimator of any information measure. It computes the measure exactly as stated in the definition, using the computed probability mass function.`Jackknife`

. Uses the a combination of the plug-in estimator and the jackknife principle to estimate the information measure.

`Shannon`

entropy estimators

The following estimators are dedicated `Shannon`

entropy estimators, which provide improvements over the naive `PlugIn`

estimator.

Any of the implemented `DiscreteInfoEstimator`

s can be used in combination with *any* `ProbabilitiesEstimator`

as input to `information`

. What this means is that every estimator actually comes in many different variants - one for each `ProbabilitiesEstimator`

. For example, the `MillerMadow`

estimator of `Shannon`

entropy is typically calculated with `RelativeAmount`

probabilities. But here, you can use for example the `BayesianRegularization`

or the `Shrinkage`

probabilities estimators instead, i.e. `information(MillerMadow(), RelativeAmount(outcome_space), x)`

and `information(MillerMadow(), BayesianRegularization(outcomes_space), x)`

are distinct estimators. This holds for all `DiscreteInfoEstimator`

s. Many of these estimators haven't been explored in the literature before, so feel free to explore, and please cite this software if you use it to explore some new estimator combination!

`ComplexityMeasures.PlugIn`

— Type`PlugIn(e::InformationMeasure) <: DiscreteInfoEstimatorGeneric`

The `PlugIn`

estimator is also called the empirical/naive/"maximum likelihood" estimator, and is used with `information`

to any discrete `InformationMeasure`

.

It computes any quantity exactly as given by its formula. When computing an information measure, which here is defined as a probabilities functional, it computes the quantity directly from a probability mass function, which is derived from maximum-likelihood (`RelativeAmount`

estimates of the probabilities.

**Bias of plug-in estimates**

The plugin-estimator of `Shannon`

entropy underestimates the true entropy, with a bias that grows with the number of distinct `outcomes`

(Arora et al., 2022)(Arora *et al.*, 2022),

\[bias(H_S^{plugin}) = -\dfrac{K-1}{2N} + o(N^-1).\]

where `K`

is the number of distinct outcomes, and `N`

is the sample size. Many authors have tried to remedy this by proposing alternative Shannon entropy estimators. For example, the `MillerMadow`

estimator is a simple correction to the plug-in estimator that adds back the bias term above. Many other estimators exist; see `DiscreteInfoEstimator`

s for an overview.

`ComplexityMeasures.MillerMadow`

— Type```
MillerMadow <: DiscreteInfoEstimatorShannon
MillerMadow(measure::Shannon = Shannon())
```

The `MillerMadow`

estimator is used with `information`

to compute the discrete `Shannon`

entropy according to Miller (1955).

**Description**

The Miller-Madow estimator of Shannon entropy is given by

\[H_S^{MM} = H_S^{plugin} + \dfrac{m - 1}{2N},\]

where $H_S^{plugin}$ is the Shannon entropy estimated using the `PlugIn`

estimator, `m`

is the number of bins with nonzero probability (as defined in Paninski (2003)), and `N`

is the number of observations.

`ComplexityMeasures.Schuermann`

— Type```
Schuermann <: DiscreteInfoEstimatorShannon
Schuermann(definition::Shannon; a = 1.0)
```

The `Schuermann`

estimator is used with `information`

to compute the discrete `Shannon`

entropy with the bias-corrected estimator given in Schürmann (2004).

See detailed description for `GeneralizedSchuermann`

for details.

`ComplexityMeasures.GeneralizedSchuermann`

— Type```
GeneralizedSchuermann <: DiscreteInfoEstimatorShannon
GeneralizedSchuermann(definition = Shannon(); a = 1.0)
```

The `GeneralizedSchuermann`

estimator is used with `information`

to compute the discrete `Shannon`

entropy with the bias-corrected estimator given in Grassberger (2022).

The "generalized" part of the name, as opposed to the Schürmann (2004) estimator (`Schuermann`

), is due to the possibility of picking difference parameters $a_i$ for different outcomes. If different parameters are assigned to the different outcomes, `a`

must be a vector of parameters of length `length(outcomes)`

, where the outcomes are obtained using `outcomes`

. See Grassberger (2022) for more information. If `a`

is a real number, then $a_i = a \forall i$, and the estimator reduces to the `Schuermann`

estimator.

**Description**

For a set of $N$ observations over $M$ outcomes, the estimator is given by

\[H_S^{opt} = \varphi(N) - \dfrac{1}{N} \sum_{i=1}^M n_i G_{n_i}(a_i),\]

where $n_i$ is the observed frequency of the i-th outcome,

\[G_n(a) = \varphi(n) + (-1)^n \int_0^a \dfrac{x^{n - 1}}{x + 1} dx,\]

$G_n(1) = G_n$ and $G_n(0) = \varphi(n)$, and

\[G_n = \varphi(n) + (-1)^n \int_0^1 \dfrac{x^{n - 1}}{x + 1} dx.\]

`ComplexityMeasures.Jackknife`

— Type```
Jackknife <: DiscreteInfoEstimatorGeneric
Jackknife(definition::InformationMeasure = Shannon())
```

The `Jackknife`

estimator is used with `information`

to compute any discrete `InformationMeasure`

.

The `Jackknife`

estimator uses the generic jackknife principle to reduce bias. Zahl (1977) was the first to apply the jaccknife technique in the context of `Shannon`

entropy estimation. Here, we've generalized his estimator to work with any `InformationMeasure`

.

**Description**

As an example of the jackknife technique, here is the formula for a jackknife estimate of `Shannon`

entropy

\[H_S^{J} = N H_S^{plugin} - \dfrac{N-1}{N} \sum_{i=1}^N {H_S^{plugin}}^{-\{i\}},\]

where $N$ is the sample size, $H_S^{plugin}$ is the plugin estimate of Shannon entropy, and ${H_S^{plugin}}^{-\{i\}}$ is the plugin estimate, but computed with the $i$-th sample left out.

`ComplexityMeasures.HorvitzThompson`

— Type```
HorvitzThompson <: DiscreteInfoEstimatorShannon
HorvitzThompson(measure::Shannon = Shannon())
```

The `HorvitzThompson`

estimator is used with `information`

to compute the discrete `Shannon`

entropy according to Horvitz and Thompson (1952).

**Description**

The Horvitz-Thompson estimator of `Shannon`

entropy is given by

\[H_S^{HT} = -\sum_{i=1}^M \dfrac{p_i \log(p_i) }{1 - (1 - p_i)^N},\]

where $N$ is the sample size and $M$ is the number of `outcomes`

. Given the true probability $p_i$ of the $i$-th outcome, $1 - (1 - p_i)^N$ is the probability that the outcome appears at least once in a sample of size $N$ (Arora *et al.*, 2022). Dividing by this inclusion probability is a form of weighting, and compensates for situations where certain outcomes have so low probabilities that they are not often observed in a sample, for example in power-law distributions.

`ComplexityMeasures.ChaoShen`

— Type```
ChaoShen <: DiscreteInfoEstimatorShannon
ChaoShen(definition::Shannon = Shannon())
```

The `ChaoShen`

estimator is used with `information`

to compute the discrete `Shannon`

entropy according to Chao and Shen (2003).

**Description**

This estimator is a modification of the `HorvitzThompson`

estimator that multiplies each plugin probability estimate by an estimate of sample coverage. If $f_1$ is the number of singletons (outcomes that occur only once) in a sample of length $N$, then the sample coverage is $C = 1 - \dfrac{f_1}{N}$. The Chao-Shen estimator of Shannon entropy is then

\[H_S^{CS} = -\sum_{i=1}^M \left( \dfrac{C p_i \log(C p_i)}{1 - (1 - C p_i)^N} \right),\]

where $N$ is the sample size and $M$ is the number of `outcomes`

. If $f_1 = N$, then $f_1$ is set to $f_1 = N - 1$ to ensure positive entropy (Arora *et al.*, 2022).

## Differential information estimators

`ComplexityMeasures.DifferentialInfoEstimator`

— Type`DifferentialInfoEstimator`

The supertype of all differential information measure estimators. These estimators compute an information measure in various ways that do not involve explicitly estimating a probability distribution.

Each `DifferentialInfoEstimator`

s uses a specialized technique to approximate relevant densities/integrals, and is often tailored to one or a few types of information measures. For example, `Kraskov`

estimates the `Shannon`

entropy.

See `information`

for usage.

**Implementations**

`ComplexityMeasures.Kraskov`

— Type```
Kraskov <: DifferentialInfoEstimator
Kraskov(definition = Shannon(); k::Int = 1, w::Int = 0)
```

The `Kraskov`

estimator computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

using the `k`

-th nearest neighbor searches method from Kraskov *et al.* (2004), with logarithms to the `base`

specified in `definition`

.

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `Kraskov`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

See also: `information`

, `KozachenkoLeonenko`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.KozachenkoLeonenko`

— Type```
KozachenkoLeonenko <: DifferentialInfoEstimator
KozachenkoLeonenko(definition = Shannon(); w::Int = 0)
```

The `KozachenkoLeonenko`

estimator (Kozachenko and Leonenko, 1987) computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `KozachenkoLeonenko`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))]\]

using the nearest neighbor method from Kozachenko and Leonenko (1987), as described in Charzyńska and Gambin (2016).

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

In contrast to `Kraskov`

, this estimator uses only the *closest* neighbor.

See also: `information`

, `Kraskov`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.Zhu`

— Type```
Zhu <: DifferentialInfoEstimator
Zhu(; definition = Shannon(), k = 1, w = 0)
```

The `Zhu`

estimator (Zhu *et al.*, 2015) is an extension to `KozachenkoLeonenko`

, and computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `Zhu`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))]\]

by approximating densities within hyperrectangles surrounding each point `xᵢ ∈ x`

using using `k`

nearest neighbor searches. `w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

See also: `information`

, `KozachenkoLeonenko`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.ZhuSingh`

— Type```
ZhuSingh <: DifferentialInfoEstimator
ZhuSingh(definition = Shannon(); k = 1, w = 0)
```

The `ZhuSingh`

estimator (Zhu *et al.*, 2015) computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `ZhuSingh`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

Like `Zhu`

, this estimator approximates probabilities within hyperrectangles surrounding each point `xᵢ ∈ x`

using using `k`

nearest neighbor searches. However, it also considers the number of neighbors falling on the borders of these hyperrectangles. This estimator is an extension to the entropy estimator in Singh *et al.* (2003).

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

See also: `information`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.Gao`

— Type```
Gao <: DifferentialInfoEstimator
Gao(definition = Shannon(); k = 1, w = 0, corrected = true)
```

The `Gao`

estimator (Gao *et al.*, 09–12 May 2015) computes the `Shannon`

differential `information`

, using a `k`

-th nearest-neighbor approach based on Singh *et al.* (2003), with logarithms to the `base`

specified in `definition`

.

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

Gao *et al.* (09–12 May 2015) give two variants of this estimator. If `corrected == false`

, then the uncorrected version is used. If `corrected == true`

, then the corrected version is used, which ensures that the estimator is asymptotically unbiased.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `KozachenkoLeonenko`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

`ComplexityMeasures.Goria`

— Type```
Goria <: DifferentialInfoEstimator
Goria(measure = Shannon(); k = 1, w = 0)
```

The `Goria`

estimator (Goria *et al.*, 2005) computes the `Shannon`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

Assume we have samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function$f : \mathbb{R}^d \to \mathbb{R}$. `Goria`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

Specifically, let $\bf{n}_1, \bf{n}_2, \ldots, \bf{n}_N$ be the distance of the samples $\{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ to their `k`

-th nearest neighbors. Next, let the geometric mean of the distances be

\[\hat{\rho}_k = \left( \prod_{i=1}^N \right)^{\dfrac{1}{N}}\]

Goria *et al.* (2005)'s estimate of Shannon differential entropy is then

\[\hat{H} = m\hat{\rho}_k + \log(N - 1) - \psi(k) + \log c_1(m),\]

where $c_1(m) = \dfrac{2\pi^\frac{m}{2}}{m \Gamma(m/2)}$ and $\psi$ is the digamma function.

`ComplexityMeasures.Lord`

— Type```
Lord <: DifferentialInfoEstimator
Lord(measure = Shannon(); k = 10, w = 0)
```

The `Lord`

estimator (Lord *et al.*, 2018) estimates the `Shannon`

differential `information`

using a nearest neighbor approach with a local nonuniformity correction (LNC), with logarithms to the `base`

specified in `definition`

.

`w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

**Description**

Assume we have samples $\bar{X} = \{\bf{x}_1, \bf{x}_2, \ldots, \bf{x}_N \}$ from a continuous random variable $X \in \mathbb{R}^d$ with support $\mathcal{X}$ and density function $f : \mathbb{R}^d \to \mathbb{R}$. `Lord`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))],\]

by using the resubstitution formula

\[\hat{\bar{X}, k} = -\mathbb{E}[\log(f(X))] \approx \sum_{i = 1}^N \log(\hat{f}(\bf{x}_i)),\]

where $\hat{f}(\bf{x}_i)$ is an estimate of the density at $\bf{x}_i$ constructed in a manner such that $\hat{f}(\bf{x}_i) \propto \dfrac{k(x_i) / N}{V_i}$, where $k(x_i)$ is the number of points in the neighborhood of $\bf{x}_i$, and $V_i$ is the volume of that neighborhood.

While most nearest-neighbor based differential entropy estimators uses regular volume elements (e.g. hypercubes, hyperrectangles, hyperspheres) for approximating the local densities $\hat{f}(\bf{x}_i)$, the `Lord`

estimator uses hyperellopsoid volume elements. These hyperellipsoids are, for each query point `xᵢ`

, estimated using singular value decomposition (SVD) on the `k`

-th nearest neighbors of `xᵢ`

. Thus, the hyperellipsoids stretch/compress in response to the local geometry around each sample point. This makes `Lord`

a well-suited entropy estimator for a wide range of systems.

`ComplexityMeasures.LeonenkoProzantoSavani`

— Type```
LeonenkoProzantoSavani <: DifferentialInfoEstimator
LeonenkoProzantoSavani(definition = Shannon(); k = 1, w = 0)
```

The `LeonenkoProzantoSavani`

estimator (Leonenko *et al.*, 2008) computes the `Shannon`

, `Renyi`

, or `Tsallis`

differential `information`

of a multi-dimensional `StateSpaceSet`

, with logarithms to the `base`

specified in `definition`

.

**Description**

The estimator uses `k`

-th nearest-neighbor searches. `w`

is the Theiler window, which determines if temporal neighbors are excluded during neighbor searches (defaults to `0`

, meaning that only the point itself is excluded when searching for neighbours).

For details, see Leonenko *et al.* (2008).

`ComplexityMeasures.Vasicek`

— Type```
Vasicek <: DifferentialInfoEstimator
Vasicek(definition = Shannon(); m::Int = 1)
```

The `Vasicek`

estimator computes the `Shannon`

differential `information`

of a timeseries using the method from Vasicek (1976), with logarithms to the `base`

specified in `definition`

.

The `Vasicek`

estimator belongs to a class of differential entropy estimators based on order statistics, of which Vasicek (1976) was the first. It only works for *timeseries* input.

**Description**

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. `Vasicek`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$,

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The `Vasicek`

`Shannon`

differential entropy estimate is then

\[\hat{H}_V(\bar{X}, m) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{n}{2m} (\bar{X}_{(i+m)} - \bar{X}_{(i-m)}) \right]\]

**Usage**

In practice, choice of `m`

influences how fast the entropy converges to the true value. For small value of `m`

, convergence is slow, so we recommend to scale `m`

according to the time series length `n`

and use `m >= n/100`

(this is just a heuristic based on the tests written for this package).

See also: `information`

, `Correa`

, `AlizadehArghami`

, `Ebrahimi`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.AlizadehArghami`

— Type```
AlizadehArghami <: DifferentialInfoEstimator
AlizadehArghami(definition = Shannon(); m::Int = 1)
```

The `AlizadehArghami`

estimator computes the `Shannon`

differential `information`

of a timeseries using the method from Alizadeh and Arghami (2010), with logarithms to the `base`

specified in `definition`

.

The `AlizadehArghami`

estimator belongs to a class of differential entropy estimators based on order statistics. It only works for *timeseries* input.

**Description**

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. `AlizadehArghami`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$:

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp.\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The `AlizadehArghami`

`Shannon`

differential entropy estimate is then the the `Vasicek`

estimate $\hat{H}_{V}(\bar{X}, m, n)$, plus a correction factor

\[\hat{H}_{A}(\bar{X}, m, n) = \hat{H}_{V}(\bar{X}, m, n) + \dfrac{2}{n}\left(m \log(2) \right).\]

See also: `information`

, `Correa`

, `Ebrahimi`

, `Vasicek`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.Ebrahimi`

— Type```
Ebrahimi <: DifferentialInfoEstimator
Ebrahimi(definition = Shannon(); m::Int = 1)
```

The `Ebrahimi`

estimator computes the `Shannon`

`information`

of a timeseries using the method from Ebrahimi *et al.* (1994), with logarithms to the `base`

specified in `definition`

.

The `Ebrahimi`

estimator belongs to a class of differential entropy estimators based on order statistics. It only works for *timeseries* input.

**Description**

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. `Ebrahimi`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, it makes use of the equivalent integral, where $F$ is the distribution function for $X$,

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$. The `Ebrahimi`

`Shannon`

differential entropy estimate is then

\[\hat{H}_{E}(\bar{X}, m) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{n}{c_i m} (\bar{X}_{(i+m)} - \bar{X}_{(i-m)}) \right],\]

where

\[c_i = \begin{cases} 1 + \frac{i - 1}{m}, & 1 \geq i \geq m \\ 2, & m + 1 \geq i \geq n - m \\ 1 + \frac{n - i}{m} & n - m + 1 \geq i \geq n \end{cases}.\]

See also: `information`

, `Correa`

, `AlizadehArghami`

, `Vasicek`

, `DifferentialInfoEstimator`

.

`ComplexityMeasures.Correa`

— Type```
Correa <: DifferentialInfoEstimator
Correa(definition = Shannon(); m::Int = 1)
```

The `Correa`

estimator computes the `Shannon`

differential `information`

of a timeseries using the method from Correa (1995), with logarithms to the `base`

specified in `definition`

.

The `Correa`

estimator belongs to a class of differential entropy estimators based on order statistics. It only works for *timeseries* input.

**Description**

Assume we have samples $\bar{X} = \{x_1, x_2, \ldots, x_N \}$ from a continuous random variable $X \in \mathbb{R}$ with support $\mathcal{X}$ and density function$f : \mathbb{R} \to \mathbb{R}$. `Correa`

estimates the `Shannon`

differential entropy

\[H(X) = \int_{\mathcal{X}} f(x) \log f(x) dx = \mathbb{E}[-\log(f(X))].\]

However, instead of estimating the above integral directly, `Correa`

makes use of the equivalent integral, where $F$ is the distribution function for $X$,

\[H(X) = \int_0^1 \log \left(\dfrac{d}{dp}F^{-1}(p) \right) dp\]

This integral is approximated by first computing the order statistics of $\bar{X}$ (the input timeseries), i.e. $x_{(1)} \leq x_{(2)} \leq \cdots \leq x_{(n)}$, ensuring that end points are included. The `Correa`

estimate of `Shannon`

differential entropy is then

\[H_C(\bar{X}, m, n) = \dfrac{1}{n} \sum_{i = 1}^n \log \left[ \dfrac{ \sum_{j=i-m}^{i+m}(\bar{X}_{(j)} - \tilde{X}_{(i)})(j - i)}{n \sum_{j=i-m}^{i+m} (\bar{X}_{(j)} - \tilde{X}_{(i)})^2} \right],\]

where

\[\tilde{X}_{(i)} = \dfrac{1}{2m + 1} \sum_{j = i - m}^{i + m} X_{(j)}.\]

See also: `information`

, `AlizadehArghami`

, `Ebrahimi`

, `Vasicek`

, `DifferentialInfoEstimator`

.

### Table of differential information measure estimators

The following estimators are *differential* information measure estimators, and can also be used with `information`

.

Each `DifferentialInfoEstimator`

s uses a specialized technique to approximate relevant densities/integrals, and is often tailored to one or a few types of information measures. For example, `Kraskov`

estimates the `Shannon`

entropy.

Estimator | Principle | Input data | `Shannon` | `Renyi` | `Tsallis` | `Kaniadakis` | `Curado` | `StretchedExponential` |
---|---|---|---|---|---|---|---|---|

`KozachenkoLeonenko` | Nearest neighbors | `StateSpaceSet` | ✓ | x | x | x | x | x |

`Kraskov` | Nearest neighbors | `StateSpaceSet` | ✓ | x | x | x | x | x |

`Zhu` | Nearest neighbors | `StateSpaceSet` | ✓ | x | x | x | x | x |

`ZhuSingh` | Nearest neighbors | `StateSpaceSet` | ✓ | x | x | x | x | x |

`Gao` | Nearest neighbors | `StateSpaceSet` | ✓ | x | x | x | x | x |

`Goria` | Nearest neighbors | `StateSpaceSet` | ✓ | x | x | x | x | x |

`Lord` | Nearest neighbors | `StateSpaceSet` | ✓ | x | x | x | x | x |

`Vasicek` | Order statistics | `Vector` | ✓ | x | x | x | x | x |

`Ebrahimi` | Order statistics | `Vector` | ✓ | x | x | x | x | x |

`Correa` | Order statistics | `Vector` | ✓ | x | x | x | x | x |

`AlizadehArghami` | Order statistics | `Vector` | ✓ | x | x | x | x | x |