ComplexityMeasures.jl

ComplexityMeasures — Module

ComplexityMeasures.jl

A Julia package that provides estimators for probabilities, entropies, and other complexity measures, in the context of nonlinear dynamics, nonlinear timeseries analysis, and complex systems. It can be used as a standalone package, or as part of several projects in the JuliaDynamics organization, such as DynamicalSystems.jl or CausalityTools.jl.

To install it, run import Pkg; Pkg.add("ComplexityMeasures").

All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl file.

Previously, this package was called Entropies.jl.

source

Content and terminology

Note

The documentation here follows (loosely) chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.

Before exploring the features of ComplexityMeasures.jl, it is useful to read through this terminology section. Here, we briefly review important complexity-related concepts and names from the scientific literature, and outline how we've structured ComplexityMeasures.jl around these concepts.

In these scientific literature, words like probabilities, entropies, and other complexity measures are used (and abused) in multiple contexts, and are often used interchangeably to describe similar concepts. The API and documentation of ComplexityMeasures.jl aim to clarify the meaning and usage of these words, and to provide simple ways to obtain probabilities, entropies, or other complexity measures from input data.

Probabilities

Entropies and other complexity measures are typically computed based on probability distributions (or more precisely probability mass functions), which we simply refer to as "probabilities". Probabilities can be obtained from input data in a plethora of different ways. The central API function that returns a probability distribution is probabilities, which takes in a subtype of ProbabilitiesEstimator to specify how the probabilities are computed. All available estimators can be found in the estimators page.

Entropies

Entropy is an established concept in statistics, information theory, and nonlinear dynamics. However, it is also an umbrella term that may mean several computationally, and sometimes even fundamentally, different quantities. In ComplexityMeasures.jl, we provide the generic function entropy that tries to both clarify disparate entropy concepts, while unifying them under a common interface that highlights the modular nature of the word "entropy". In summary, there are only two main types of entropy.

Discrete entropies are functions of probabilities (specifically, probability mass functions). Computing a discrete entropy boils down to two simple steps: first estimating a probability distribution, then plugging the estimated probabilities into one of the so-called "generalized entropy" definitions. Internally, this is literally just a few lines of code where we first apply some ProbabilitiesEstimator to the input data, and feed the resulting probabilities to entropy with some EntropyDefinition.
Differential/continuous entropies are functions of probability density functions, which are integrals. Computing differential entropies therefore rely on estimating some density functional. For this task, we provide DifferentialEntropyEstimators, which compute entropies via alternate means, without explicitly computing some probability distribution. For example, the Correa estimator computes the Shannon differential entropy using order statistics.

Crucially, many quantities in the nonlinear dynamics literature that are named as entropies, such as "permutation entropy" (entropy_permutation) and "wavelet entropy" (entropy_wavelet), are not really new entropies. They are the good old discrete Shannon entropy (Shannon), but calculated with new probabilities estimators.

Even though the names of these methods (e.g. "wavelet entropy") sound like names for new entropies, they are method names. What these methods actually do is to devise novel ways of calculating probabilities from data, and then plug those probabilities into formal discrete entropy formulas such as the Shannon entropy. These probabilities estimators are of course smartly created so that they elegantly highlight important complexity-related aspects of the data.

Names for methods such as "permutation entropy" are commonplace, so in ComplexityMeasures.jl we provide convenience functions like entropy_permutation. However, we emphasise that these functions really aren't anything more than 2-lines-of-code wrappers that call entropy with the appropriate ProbabilitiesEstimator.

What are genuinely different entropies are different definitions of entropy. And there are a lot of them! Examples are Shannon (the classic), Renyi or Tsallis entropy. These different definitions can be found in EntropyDefinition.

Other complexity measures

Other complexity measures, which strictly speaking don't compute entropies, and may or may not explicitly compute probability distributions, are found in Complexity measures page. This includes measures like sample entropy and approximate entropy.

Input data for ComplexityMeasures.jl

The input data type typically depend on the probability estimator chosen. In general though, the standard DynamicalSystems.jl approach is taken and as such we have three types of input data:

Timeseries, which are AbstractVector{<:Real}, used in e.g. with WaveletOverlap.
Multi-variate timeseries, or datasets, or state space sets, which are Datasets, used e.g. with NaiveKernel.
Spatial data, which are higher dimensional standard Arrays, used e.g. with SpatialSymbolicPermutation.

StateSpaceSets.Dataset — Type

Dataset{D, T} <: AbstractDataset{D,T}

A dedicated interface for datasets. It contains equally-sized datapoints of length D, represented by SVector{D, T}. These data are a standard Julia Vector{SVector}, and can be obtained with vec(dataset).

When indexed with 1 index, a dataset is like a vector of datapoints. When indexed with 2 indices it behaves like a matrix that has each of the columns be the timeseries of each of the variables.

Dataset also supports most sensible operations like append!, push!, hcat, eachrow, among others, and when iterated over, it iterates over its contained points.

Description of indexing

In the following let i, j be integers, typeof(data) <: AbstractDataset and v1, v2 be <: AbstractVector{Int} (v1, v2 could also be ranges, and for massive performance benefits make v2 an SVector{X, Int}).

data[i] == data[i, :] gives the ith datapoint (returns an SVector)
data[v1] == data[v1, :], returns a Dataset with the points in those indices.
data[:, j] gives the jth variable timeseries, as Vector
data[v1, v2], data[:, v2] returns a Dataset with the appropriate entries (first indices being "time"/point index, while second being variables)
data[i, j] value of the jth variable, at the ith timepoint

Use Matrix(dataset) or Dataset(matrix) to convert. It is assumed that each column of the matrix is one variable. If you have various timeseries vectors x, y, z, ... pass them like Dataset(x, y, z, ...). You can use columns(dataset) to obtain the reverse, i.e. all columns of the dataset in a tuple.