ComplexityMeasures.jl
ComplexityMeasures
— ModuleComplexityMeasures.jl
A Julia package that provides estimators for probabilities, entropies, and other complexity measures, in the context of nonlinear dynamics, nonlinear timeseries analysis, and complex systems. It can be used as a standalone package, or as part of several projects in the JuliaDynamics organization, such as DynamicalSystems.jl or CausalityTools.jl.
To install it, run import Pkg; Pkg.add("ComplexityMeasures")
.
All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl
file.
Previously, this package was called Entropies.jl.
Content and terminology
The documentation here follows (loosely) chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.
Before exploring the features of ComplexityMeasures.jl, it is useful to read through this terminology section. Here, we briefly review important complexity-related concepts and names from the scientific literature, and outline how we've structured ComplexityMeasures.jl around these concepts.
In these scientific literature, words like probabilities, entropies, and other complexity measures are used (and abused) in multiple contexts, and are often used interchangeably to describe similar concepts. The API and documentation of ComplexityMeasures.jl aim to clarify the meaning and usage of these words, and to provide simple ways to obtain probabilities, entropies, or other complexity measures from input data.
Probabilities
Entropies and other complexity measures are typically computed based on probability distributions (or more precisely probability mass functions), which we simply refer to as "probabilities". Probabilities can be obtained from input data in a plethora of different ways. The central API function that returns a probability distribution is probabilities
, which takes in a subtype of ProbabilitiesEstimator
to specify how the probabilities are computed. All available estimators can be found in the estimators page.
Entropies
Entropy is an established concept in statistics, information theory, and nonlinear dynamics. However, it is also an umbrella term that may mean several computationally, and sometimes even fundamentally, different quantities. In ComplexityMeasures.jl, we provide the generic function entropy
that tries to both clarify disparate entropy concepts, while unifying them under a common interface that highlights the modular nature of the word "entropy". In summary, there are only two main types of entropy.
- Discrete entropies are functions of probabilities (specifically, probability mass functions). Computing a discrete entropy boils down to two simple steps: first estimating a probability distribution, then plugging the estimated probabilities into one of the so-called "generalized entropy" definitions. Internally, this is literally just a few lines of code where we first apply some
ProbabilitiesEstimator
to the input data, and feed the resultingprobabilities
toentropy
with someEntropyDefinition
. - Differential/continuous entropies are functions of probability density functions, which are integrals. Computing differential entropies therefore rely on estimating some density functional. For this task, we provide
DifferentialEntropyEstimator
s, which compute entropies via alternate means, without explicitly computing some probability distribution. For example, theCorrea
estimator computes the Shannon differential entropy using order statistics.
Crucially, many quantities in the nonlinear dynamics literature that are named as entropies, such as "permutation entropy" (entropy_permutation
) and "wavelet entropy" (entropy_wavelet
), are not really new entropies. They are the good old discrete Shannon entropy (Shannon
), but calculated with new probabilities estimators.
Even though the names of these methods (e.g. "wavelet entropy") sound like names for new entropies, they are method names. What these methods actually do is to devise novel ways of calculating probabilities from data, and then plug those probabilities into formal discrete entropy formulas such as the Shannon entropy. These probabilities estimators are of course smartly created so that they elegantly highlight important complexity-related aspects of the data.
Names for methods such as "permutation entropy" are commonplace, so in ComplexityMeasures.jl we provide convenience functions like entropy_permutation
. However, we emphasise that these functions really aren't anything more than 2-lines-of-code wrappers that call entropy
with the appropriate ProbabilitiesEstimator
.
What are genuinely different entropies are different definitions of entropy. And there are a lot of them! Examples are Shannon
(the classic), Renyi
or Tsallis
entropy. These different definitions can be found in EntropyDefinition
.
Other complexity measures
Other complexity measures, which strictly speaking don't compute entropies, and may or may not explicitly compute probability distributions, are found in Complexity measures page. This includes measures like sample entropy and approximate entropy.
Input data for ComplexityMeasures.jl
The input data type typically depend on the probability estimator chosen. In general though, the standard DynamicalSystems.jl approach is taken and as such we have three types of input data:
- Timeseries, which are
AbstractVector{<:Real}
, used in e.g. withWaveletOverlap
. - Multi-variate timeseries, or datasets, or state space sets, which are
Dataset
s, used e.g. withNaiveKernel
. - Spatial data, which are higher dimensional standard
Array
s, used e.g. withSpatialSymbolicPermutation
.
StateSpaceSets.Dataset
— TypeDataset{D, T} <: AbstractDataset{D,T}
A dedicated interface for datasets. It contains equally-sized datapoints of length D
, represented by SVector{D, T}
. These data are a standard Julia Vector{SVector}
, and can be obtained with vec(dataset)
.
When indexed with 1 index, a dataset
is like a vector of datapoints. When indexed with 2 indices it behaves like a matrix that has each of the columns be the timeseries of each of the variables.
Dataset
also supports most sensible operations like append!, push!, hcat, eachrow
, among others, and when iterated over, it iterates over its contained points.
Description of indexing
In the following let i, j
be integers, typeof(data) <: AbstractDataset
and v1, v2
be <: AbstractVector{Int}
(v1, v2
could also be ranges, and for massive performance benefits make v2
an SVector{X, Int}
).
data[i] == data[i, :]
gives thei
th datapoint (returns anSVector
)data[v1] == data[v1, :]
, returns aDataset
with the points in those indices.data[:, j]
gives thej
th variable timeseries, asVector
data[v1, v2], data[:, v2]
returns aDataset
with the appropriate entries (first indices being "time"/point index, while second being variables)data[i, j]
value of thej
th variable, at thei
th timepoint
Use Matrix(dataset)
or Dataset(matrix)
to convert. It is assumed that each column of the matrix
is one variable. If you have various timeseries vectors x, y, z, ...
pass them like Dataset(x, y, z, ...)
. You can use columns(dataset)
to obtain the reverse, i.e. all columns of the dataset in a tuple.