Complexity measures

In this page we document estimators for complexity measures that are not entropies in the strict mathematical sense. The API is almost identical to entropy and is defined by:

complexity
complexity_normalized
ComplexityEstimator

Complexity measures API

ComplexityMeasures.complexity — Function

complexity(c::ComplexityEstimator, x)

Estimate a complexity measure according to c for input data x, where c can be any of the following estimators:

ReverseDispersion.
ApproximateEntropy.
SampleEntropy.
MissingDispersionPatterns.

source

ComplexityMeasures.complexity_normalized — Function

complexity_normalized(c::ComplexityEstimator, x) → m ∈ [a, b]

The same as complexity, but the result is normalized to the interval [a, b], where [a, b] depends on c.

source

ComplexityMeasures.ComplexityEstimator — Type

ComplexityEstimator

Supertype for estimators for various complexity measures that are not entropies in the strict mathematical sense. See complexity for all available estimators.

source

Approximate entropy

ComplexityMeasures.ApproximateEntropy — Type

ApproximateEntropy <: ComplexityEstimator
ApproximateEntropy([x]; r = 0.2std(x), kwargs...)

An estimator for the approximate entropy (ApEn; Pincus, 1991)^[Pincus1991] complexity measure, used with complexity.

The keyword argument r is mandatory if an input timeseries x is not provided.

Keyword arguments

r::Real: The radius used when querying for nearest neighbors around points. Its value should be determined from the input data, for example as some proportion of the standard deviation of the data.
m::Int = 2: The embedding dimension.
τ::Int = 1: The embedding lag.
base::Real = MathConstants.e: The base to use for the logarithm. Pincus (1991) uses the natural logarithm.

Description

Approximate entropy is defined as

\[ApEn(m ,r) = \lim_{N \to \infty} \left[ \phi(x, m, r) - \phi(x, m + 1, r) \right].\]

Approximate entropy is estimated for a timeseries x, by first embedding x using embedding dimension m and embedding lag τ, then searching for similar vectors within tolerance radius r, using the estimator described below, with logarithms to the given base (natural logarithm is used in Pincus, 1991).

Specifically, for a finite-length timeseries x, an estimator for $ApEn(m ,r)$ is

\[ApEn(m, r, N) = \phi(x, m, r, N) - \phi(x, m + 1, r, N),\]

where N = length(x) and

\[\phi(x, k, r, N) = \dfrac{1}{N-(k-1)\tau} \sum_{i=1}^{N - (k-1)\tau} \log{\left( \sum_{j = 1}^{N-(k-1)\tau} \dfrac{\theta(d({\bf x}_i^m, {\bf x}_j^m) \leq r)}{N-(k-1)\tau} \right)}.\]

Here, $\theta(\cdot)$ returns 1 if the argument is true and 0 otherwise, $d({\bf x}_i, {\bf x}_j)$ returns the Chebyshev distance between vectors ${\bf x}_i$ and ${\bf x}_j$, and the k-dimensional embedding vectors are constructed from the input timeseries $x(t)$ as

\[{\bf x}_i^k = (x(i), x(i+τ), x(i+2τ), \ldots, x(i+(k-1)\tau)).\]

Flexible embedding lag

In the original paper, they fix τ = 1. In our implementation, the normalization constant is modified to account for embeddings with τ != 1.

source

Sample entropy

ComplexityMeasures.SampleEntropy — Type

SampleEntropy([x]; r = 0.2std(x), kwargs...) <: ComplexityEstimator

An estimator for the sample entropy complexity measure (Richman & Moorman, 2000)^{[Richman2000]}, used with complexity and complexity_normalized.

The keyword argument r is mandatory if an input timeseries x is not provided.

Keyword arguments

r::Real: The radius used when querying for nearest neighbors around points. Its value should be determined from the input data, for example as some proportion of the standard deviation of the data.
m::Int = 1: The embedding dimension.
τ::Int = 1: The embedding lag.

Description

An estimator for sample entropy using radius r, embedding dimension m, and embedding lag τ is

\[SampEn(m,r, N) = -\ln{\dfrac{A(r, N)}{B(r, N)}}.\]

Here,

\[\begin{aligned} B(r, m, N) = \sum_{i = 1}^{N-m\tau} \sum_{j = 1, j \neq i}^{N-m\tau} \theta(d({\bf x}_i^m, {\bf x}_j^m) \leq r) \\ A(r, m, N) = \sum_{i = 1}^{N-m\tau} \sum_{j = 1, j \neq i}^{N-m\tau} \theta(d({\bf x}_i^{m+1}, {\bf x}_j^{m+1}) \leq r) \\ \end{aligned},\]

where $\theta(\cdot)$ returns 1 if the argument is true and 0 otherwise, and $d(x, y)$ computes the Chebyshev distance between $x$ and $y$, and ${\bf x}_i^{m}$ and ${\bf x}_i^{m+1}$ are m-dimensional and m+1-dimensional embedding vectors, where k-dimensional embedding vectors are constructed from the input timeseries $x(t)$ as

\[{\bf x}_i^k = (x(i), x(i+τ), x(i+2τ), \ldots, x(i+(k-1)\tau)).\]

Quoting Richman & Moorman (2002): "SampEn(m,r,N) will be defined except when B = 0, in which case no regularity has been detected, or when A = 0, which corresponds to a conditional probability of 0 and an infinite value of SampEn(m,r,N)". In these cases, NaN is returned.

If computing the normalized measure, then the resulting sample entropy is on [0, 1].

Flexible embedding lag

The original algorithm fixes τ = 1. All formulas here are modified to account for any τ.

Missing dispersion patterns

ComplexityMeasures.MissingDispersionPatterns — Type

MissingDispersionPatterns <: ComplexityEstimator
MissingDispersionPatterns(est = Dispersion())

An estimator for the number of missing dispersion patterns ($N_{MDP}$), a complexity measure which can be used to detect nonlinearity in time series (Zhou et al., 2022)^[Zhou2022].

Used with complexity or complexity_normalized, whose implementation uses missing_outcomes.

Description

If used with complexity, $N_{MDP}$ is computed by first symbolising each xᵢ ∈ x, then embedding the resulting symbol sequence using the dispersion pattern estimator est, and computing the quantity

\[N_{MDP} = L - N_{ODP},\]

where L = total_outcomes(est) (i.e. the total number of possible dispersion patterns), and $N_{ODP}$ is defined as the number of occurring dispersion patterns.

If used with complexity_normalized, then $N_{MDP}^N = (L - N_{ODP})/L$ is computed. The authors recommend that total_outcomes(est.symbolization)^est.m << length(x) - est.m*est.τ + 1 to avoid undersampling.

Encoding

Dispersion's linear mapping from CDFs to integers is based on equidistant partitioning of the interval [0, 1]. This is slightly different from Zhou et al. (2022), which uses the linear mapping $s_i := \text{round}(y + 0.5)$.

Usage

In Zhou et al. (2022), MissingDispersionPatterns is used to detect nonlinearity in time series by comparing the $N_{MDP}$ for a time series x to $N_{MDP}$ values for an ensemble of surrogates of x. If $N_{MDP} > q_{MDP}^{WIAAFT}$, where $q_{MDP}^{WIAAFT}$ is some q-th quantile of the surrogate ensemble, then it is taken as evidence for nonlinearity.

source

Reverse dispersion entropy

ComplexityMeasures.ReverseDispersion — Type

ReverseDispersion <: ComplexityEstimator
ReverseDispersion(; c = 3, m = 2, τ = 1, check_unique = true)

Estimator for the reverse dispersion entropy complexity measure (Li et al., 2019)^[Li2019].

Description

Li et al. (2021)^[Li2019] defines the reverse dispersion entropy as

\[H_{rde} = \sum_{i = 1}^{c^m} \left(p_i - \dfrac{1}{{c^m}} \right)^2 = \left( \sum_{i=1}^{c^m} p_i^2 \right) - \dfrac{1}{c^{m}}\]

where the probabilities $p_i$ are obtained precisely as for the Dispersion probability estimator. Relative frequencies of dispersion patterns are computed using the given encoding scheme , which defaults to encoding using the normal cumulative distribution function (NCDF), as implemented by GaussianCDFEncoding, using embedding dimension m and embedding delay τ. Recommended parameter values^[Li2018] are m ∈ [2, 3], τ = 1 for the embedding, and c ∈ [3, 4, …, 8] categories for the Gaussian mapping.

If normalizing, then the reverse dispersion entropy is normalized to [0, 1].

The minimum value of $H_{rde}$ is zero and occurs precisely when the dispersion pattern distribution is flat, which occurs when all $p_i$s are equal to $1/c^m$. Because $H_{rde} \geq 0$, $H_{rde}$ can therefore be said to be a measure of how far the dispersion pattern probability distribution is from white noise.

Data requirements

The input must have more than one unique element for the default GaussianEncoding to be well-defined. Li et al. (2018) recommends that x has at least 1000 data points.

If check_unique == true (default), then it is checked that the input has more than one unique value. If check_unique == false and the input only has one unique element, then a InexactError is thrown when trying to compute probabilities.

source

Statistical complexity

ComplexityMeasures.StatisticalComplexity — Type

StatisticalComplexity <: ComplexityEstimator
StatisticalComplexity([x]; kwargs...)

An estimator for the statistical complexity and entropy according to Rosso et al. (2007)^[Rosso2007](@ref), used with complexity.

Keyword arguments

est::ProbabilitiesEstimator = SymbolicPermutation(): which estimator to use to get the probabilities
dist<:SemiMetric = JSDivergence(): the distance measure between the estimated probability distribution and a uniform distribution with the same maximal number of bins

Description

Statistical complexity is defined as

\[C_q[P] = \mathcal{H}_q\cdot \mathcal{Q}_q[P],\]

where $Q_q$ is a "disequilibrium" obtained from a distance-measure and H_q a disorder measure. In the original paper^[Rosso2007], this complexity measure was defined via an ordinal pattern-based probability distribution, the Shannon entropy and the Jensen-Shannon divergence as a distance measure. This implementation allows for a generalization of the complexity measure as developed in ^[Rosso2013]. Here, $H_q$can be the (q-order) Shannon-, Renyi or Tsallis entropy andQ_q` based either on the Euclidean, Wooters, Kullback, q-Kullback, Jensen or q-Jensen distance as

\[Q_q[P] = Q_q^0\cdot D[P, P_e],\]

where $D[P, P_e]$ is the distance between the obtained distribution $P$ and a uniform distribution with the same maximum number of bins, measured by the distance measure dist.

Usage

The statistical complexity is exclusively used in combination with the related information measure (entropy). complexity(c::StatisticalComplexity, x) returns only the statistical complexity. The entropy can be accessed as a Ref value of the struct as

x = randn(100)
c = StatisticalComplexity
compl = complexity(c, x)
entr = c.entr_val[]

To obtain both the entropy and the statistical complexity together as a Tuple, use the wrapper entropy_complexity.

source

ComplexityMeasures.entropy_complexity — Function

entropy_complexity(c::StatisticalComplexity, x)

Return both the entropy and the corresponding StatisticalComplexity. Useful when wanting to plot data on the "entropy-complexity plane". See also entropy_complexity_curves.

source

ComplexityMeasures.entropy_complexity_curves — Function

entropy_complexity_curves(c::StatisticalComplexity; num_max=1, num_min=1000) -> (min_entropy_complexity, max_entropy_complexity)

Calculate the maximum complexity-entropy curve for the statistical complexity according to ^[Rosso2007] for num_max * total_outcomes(c.est) different values of the normalized information measure of choice (in case of the maximum complexity curves) and num_min different values of the normalized information measure of choice (in case of the minimum complexity curve).

Description

The way the statistical complexity is designed, there is a minimum and maximum possible complexity for data with a given permutation entropy. The calculation time of the maximum complexity curve grows as O(total_outcomes(c.est)^2), and thus takes very long for high numbers of outcomes. This function is inspired by S. Sippels implementation in statcomp ^[statcomp].

This function will work with any ProbabilitiesEstimator where total_outcomes(@ref) is known a priori.

source

Pincus1991Pincus, S. M. (1991). Approximate entropy as a measure of system complexity. Proceedings of the National Academy of Sciences, 88(6), 2297-2301.
Richman2000Richman, J. S., & Moorman, J. R. (2000). Physiological time-series analysis using approximate entropy and sample entropy. American Journal of Physiology-Heart and Circulatory Physiology, 278(6), H2039-H2049.
Zhou2022Zhou, Q., Shang, P., & Zhang, B. (2022). Using missing dispersion patterns to detect determinism and nonlinearity in time series data. Nonlinear Dynamics, 1-20.
Li2019Li, Y., Gao, X., & Wang, L. (2019). Reverse dispersion entropy: a new complexity measure for sensor signal. Sensors, 19(23), 5203.
Rosso2007Rosso, O. A. et al. (2007). Distinguishing Noise from Chaos. Physical Review Letters 99, no. 15: 154102. https://doi.org/10.1103/PhysRevLett.99.154102.
Rosso2013Rosso, O. A. (2013) Generalized Statistical Complexity: A New Tool for Dynamical Systems.
Rosso2007Rosso, O. A., Larrondo, H. A., Martin, M. T., Plastino, A., & Fuentes, M. A. (2007). Distinguishing noise from chaos. Physical review letters, 99(15), 154102.
statcompSippel, S., Lange, H., Gans, F. (2019). statcomp: Statistical Complexity and Information Measures for Time Series Analysis