ComplexityMeasures.jl
ComplexityMeasures
— ModuleComplexityMeasures.jl
A Julia package that provides estimators for probabilities, entropies, and other complexity measures, in the context of nonlinear dynamics, nonlinear timeseries analysis, and complex systems. It can be used as a standalone package, or as part of other projects in the JuliaDynamics organization, such as DynamicalSystems.jl or CausalityTools.jl.
To install it, run import Pkg; Pkg.add("ComplexityMeasures")
.
All further information is provided in the documentation, which you can either find online or build locally by running the docs/make.jl
file.
Previously, this package was called Entropies.jl.
Content and terminology
The documentation here follows (loosely) chapter 5 of Nonlinear Dynamics, Datseris & Parlitz, Springer 2022.
Before exploring the features of ComplexityMeasures.jl, it is useful to read through this terminology section. Here, we briefly review important complexity-related concepts and names from the scientific literature, and outline how we've structured ComplexityMeasures.jl around these concepts.
In these scientific literature, words like probabilities, entropies, and other complexity measures are used (and abused) in multiple contexts, and are often used interchangeably to describe similar concepts. The API and documentation of ComplexityMeasures.jl aim to clarify the meaning and usage of these words, and to provide simple ways to obtain probabilities, entropies, or other complexity measures from input data.
For ComplexityMeasures.jl entropies are also complexity measures, while sometimes a distinction is made so that "complexity measures" means anything beyond entropy. However we believe the general nonlinear dynamics community agrees with our take, as most papers that introduce different entropy flavors, call them complexity measures. Example: "Permutation Entropy: A Natural Complexity Measure for Time Series" from Brandt and Pompe 2002.
Probabilities
Entropies and other complexity measures are typically computed based on probability distributions, which we simply refer to as "probabilities". Probabilities can be obtained from input data in a plethora of different ways. The central API function that returns a probability distribution is probabilities
, which takes in a subtype of ProbabilitiesEstimator
to specify how the probabilities are computed. All available estimators can be found in the estimators page.
Entropies
Entropy is an established concept in statistics, information theory, and nonlinear dynamics. However, it is also an umbrella term that may mean several computationally, and sometimes even fundamentally, different quantities. In ComplexityMeasures.jl, we provide the generic function entropy
that tries to both clarify disparate entropy concepts, while unifying them under a common interface that highlights the modular nature of the word "entropy".
On the highest level, there are two main types of entropy.
- Discrete entropies are functions of probability mass functions. Computing a discrete entropy boils down to two simple steps: first estimating a probability distribution, then plugging the estimated probabilities into an estimator of a so-called "generalized entropy" definition. Internally, this is literally just a few lines of code where we first apply some
ProbabilitiesEstimator
to the input data, and feed the resultingprobabilities
toentropy
with someDiscreteEntropyEstimator
. - Differential/continuous entropies are functions of probability density functions, which are integrals. Computing differential entropies therefore rely on estimating some density functional. For this task, we provide
DifferentialEntropyEstimator
s, which compute entropies via alternate means, without explicitly computing some probability distribution. For example, theCorrea
estimator computes the Shannon differential entropy using order statistics.
Crucially, many quantities in the nonlinear dynamics literature that are named as entropies, such as "permutation entropy" (entropy_permutation
) and "wavelet entropy" (entropy_wavelet
), are not really new entropies. They are the good old discrete Shannon entropy (Shannon
), but calculated with new probabilities estimators.
Even though the names of these methods (e.g. "wavelet entropy") sound like names for new entropies, what they actually do is to devise novel ways of calculating probabilities from data, and then plug those probabilities into formal discrete entropy formulas such as the Shannon entropy. These probabilities estimators are of course smartly created so that they elegantly highlight important complexity-related aspects of the data.
Names for methods such as "permutation entropy" are commonplace, so in ComplexityMeasures.jl we provide convenience functions like entropy_permutation
. However, we emphasize that these functions really aren't anything more than 2-lines-of-code wrappers that call entropy
with the appropriate ProbabilitiesEstimator
.
What are genuinely different entropies are different definitions of entropy. And there are a lot of them! Examples are Shannon
(the classic), Renyi
or Tsallis
entropy. These different definitions can be found in EntropyDefinition
.
Other complexity measures
Other complexity measures, which strictly speaking don't compute entropies, and may or may not explicitly compute probability distributions, are found in Complexity measures page. This includes measures like sample entropy and approximate entropy.
Input data for ComplexityMeasures.jl
The input data type typically depend on the probability estimator chosen. In general though, the standard DynamicalSystems.jl approach is taken and as such we have three types of input data:
- Timeseries, which are
AbstractVector{<:Real}
, used in e.g. withWaveletOverlap
. - Multi-variate timeseries, or datasets, or state space sets, which are
StateSpaceSet
s, used e.g. withNaiveKernel
. - Spatial data, which are higher dimensional standard
Array
s, used e.g. withSpatialSymbolicPermutation
.
StateSpaceSets.StateSpaceSet
— TypeStateSpaceSet{D, T} <: AbstractStateSpaceSet{D,T}
A dedicated interface for sets in a state space. It is an ordered container of equally-sized points of length D
. Each point is represented by SVector{D, T}
. The data are a standard Julia Vector{SVector}
, and can be obtained with vec(ssset::StateSpaceSet)
. Typically the order of points in the set is the time direction, but it doesn't have to be.
When indexed with 1 index, StateSpaceSet
is like a vector of points. When indexed with 2 indices it behaves like a matrix that has each of the columns be the timeseries of each of the variables. When iterated over, it iterates over its contained points. See description of indexing below for more.
StateSpaceSet
also supports almost all sensible vector operations like append!, push!, hcat, eachrow
, among others.
Description of indexing
In the following let i, j
be integers, typeof(X) <: AbstractStateSpaceSet
and v1, v2
be <: AbstractVector{Int}
(v1, v2
could also be ranges, and for performance benefits make v2
an SVector{Int}
).
X[i] == X[i, :]
gives thei
th point (returns anSVector
)X[v1] == X[v1, :]
, returns aStateSpaceSet
with the points in those indices.X[:, j]
gives thej
th variable timeseries (or collection), asVector
X[v1, v2], X[:, v2]
returns aStateSpaceSet
with the appropriate entries (first indices being "time"/point index, while second being variables)X[i, j]
value of thej
th variable, at thei
th timepoint
Use Matrix(ssset)
or StateSpaceSet(matrix)
to convert. It is assumed that each column of the matrix
is one variable. If you have various timeseries vectors x, y, z, ...
pass them like StateSpaceSet(x, y, z, ...)
. You can use columns(dataset)
to obtain the reverse, i.e. all columns of the dataset in a tuple.