Probabilities
Be sure you have gone through the Tutorial before going through the API here to have a good idea of the terminology used in ComplexityMeasures.jl.
ComplexityMeasures.jl implements an interface for probabilities that exactly follows the mathematically rigorous formulation of probability spaces. Probability spaces are formalized by an OutcomeSpace $\Omega$. Probabilities are extracted from data then by referencing an outcome space in the functions counts and probabilities. The mathematical formulation of probabilities spaces is further enhanced by ProbabilitiesEstimator and its subtypes, which may correct theoretically known biases when estimating probabilities from finite data.
In reality, probabilities can be either discrete (mass functions) or continuous (density functions). Currently in ComplexityMeasures.jl, only probability mass functions (i.e., countable $\Omega$) are implemented explicitly. Quantities that are estimated from probability density functions (i.e., uncountable $\Omega$) also exist and are implemented in ComplexityMeasures.jl. However, these are estimated by a one-step processes without the intermediate estimation of probabilities.
If $\Omega$ is countable, the process of estimating the outcomes from input data is also called discretization of the input data.
Outcome spaces
ComplexityMeasures.OutcomeSpace — TypeOutcomeSpaceThe supertype for all outcome space implementation.
Description
In ComplexityMeasures.jl, an outcome space defines a set of possible outcomes $\Omega = \{\omega_1, \omega_2, \ldots, \omega_L \}$ (some form of discretization). In the literature, the outcome space is often also called an "alphabet", while each outcome is called a "symbol" or an "event".
An outcome space also defines a set of rules for mapping input data to to each outcome $\omega_i$, a processes called encoding or symbolizing or discretizing in the literature (see encodings). Some OutcomeSpaces first apply a transformation, e.g. a delay embedding, to the data before discretizing/encoding, while other OutcomeSpaces discretize/encode the data directly.
Implementations
| Outcome space | Principle | Input data | Counting-compatible |
|---|---|---|---|
UniqueElements | Count of unique elements | Any | ✔ |
ValueBinning | Binning (histogram) | Vector, StateSpaceSet | ✔ |
OrdinalPatterns | Ordinal patterns | Vector, StateSpaceSet | ✔ |
SpatialOrdinalPatterns | Ordinal patterns in space | Array | ✔ |
Dispersion | Dispersion patterns | Vector | ✔ |
SpatialDispersion | Dispersion patterns in space | Array | ✔ |
CosineSimilarityBinning | Cosine similarity | Vector | ✔ |
BubbleSortSwaps | Swap counts when sorting | Vector | ✔ |
SequentialPairDistances | Sequential state vector distances | Vector, StateSpaceSet | ✔ |
TransferOperator | Binning (transfer operator) | Vector, StateSpaceSet | ✖ |
NaiveKernel | Kernel density estimation | StateSpaceSet | ✖ |
WeightedOrdinalPatterns | Ordinal patterns | Vector, StateSpaceSet | ✖ |
AmplitudeAwareOrdinalPatterns | Ordinal patterns | Vector, StateSpaceSet | ✖ |
WaveletOverlap | Wavelet transform | Vector | ✖ |
PowerSpectrum | Fourier transform | Vector | ✖ |
In the column "input data" it is assumed that the eltype of the input is <: Real.
Usage
Outcome spaces are used as input to
probabilities/allprobabilities_and_outcomesfor computing probability mass functions.outcome_space, which returns the elements of the outcome space.total_outcomes, which returns the cardinality of the outcome space.counts/counts_and_outcomes/allcounts_and_outcomes, for obtaining raw counts instead of probabilities (only for counting-compatible outcome spaces).
Counting-compatible vs. non-counting compatible outcome spaces
There are two main types of outcome spaces.
- Counting-compatible outcome spaces have a well-defined way of counting how often each point in the (encoded) input data is mapped to a particular outcome $\omega_i$. These outcome spaces use
encodeto discretize the input data. Examples areOrdinalPatterns(which encodes input data into ordinal patterns) orValueBinning(which discretizes points onto a regular grid). The table below lists which outcome spaces are counting compatible. - Non-counting compatible outcome spaces have no well-defined way of counting explicitly how often each point in the input data is mapped to a particular outcome $\omega_i$. Instead, these outcome spaces returns a vector of pre-normalized "relative counts", one for each outcome $\omega_i$. Examples are
WaveletOverlaporPowerSpectrum.
Counting-compatible outcome spaces can be used with any ProbabilitiesEstimator to convert counts into probability mass functions. Non-counting-compatible outcome spaces can only be used with the maximum likelihood (RelativeAmount) probabilities estimator, which estimates probabilities precisely by the relative frequency of each outcome (formally speaking, the RelativeAmount estimator also requires counts, but for the sake of code consistency, we allow it to be used with relative frequencies as well).
The function is_counting_based can be used to check whether an outcome space is based on counting.
Deducing the outcome space (from data)
Some outcome space models can deduce $\Omega$ without knowledge of the input, such as OrdinalPatterns. Other outcome spaces require knowledge of the input data for concretely specifying $\Omega$, such as ValueBinning with RectangularBinning. If o is some outcome space model and x some input data, then outcome_space(o, x) returns the possible outcomes $\Omega$. To get the cardinality of $\Omega$, use total_outcomes.
Implementation details
The element type of $\Omega$ varies between outcome space models, but it is guaranteed to be hashable and sortable. This allows for conveniently tracking the counts of a specific event across experimental realizations, by using the outcome as a dictionary key and the counts as the value for that key (or, alternatively, the key remains the outcome and one has a vector of probabilities, one for each experimental realization).
ComplexityMeasures.outcomes — Functionoutcomes(o::OutcomeSpace, x)Return all (unique) outcomes that appear in the (encoded) input data x, according to the given OutcomeSpace. Equivalent to probabilities_and_outcomes(o, x)[2], but for some estimators it may be explicitly extended for better performance.
ComplexityMeasures.outcome_space — Functionoutcome_space(o::OutcomeSpace, x) → ΩReturn a sorted container containing all possible outcomes of o for input x.
For some estimators the concrete outcome space is known without knowledge of input x, in which case the function dispatches to outcome_space(o). In general it is recommended to use the 2-argument version irrespectively of estimator.
ComplexityMeasures.total_outcomes — Functiontotal_outcomes(o::OutcomeSpace, x)Return the length (cardinality) of the outcome space $\Omega$ of est.
For some OutcomeSpace, the cardinality is known without knowledge of input x, in which case the function dispatches to total_outcomes(est). In general it is recommended to use the 2-argument version irrespectively of estimator.
ComplexityMeasures.missing_outcomes — Functionmissing_outcomes(o::OutcomeSpace, x; all = false) → n::IntCount the number of missing outcomes n (i.e., not occuring in the data) specified by o, given input data x. This function only works for count-based outcome spaces, use missing_probabilities otherwise.
See also: MissingDispersionPatterns.
Count occurrences
ComplexityMeasures.UniqueElements — TypeUniqueElements()An OutcomeSpace based on straight-forward counting of distinct elements in a univariate time series or multivariate dataset. This is the same as giving no estimator to probabilities.
Outcome space
The outcome space is the unique sorted values of the input. Hence, input x is needed for a well-defined outcome_space.
Implements
codify. Used for encoding inputs where ordering matters (e.g. time series).
Histograms
ComplexityMeasures.ValueBinning — TypeValueBinning(b::AbstractBinning) <: OutcomeSpaceAn OutcomeSpace based on binning the values of the data as dictated by the binning scheme b and formally computing their histogram, i.e., the frequencies of points in the bins. An alias to this is VisitationFrequency. Available binnings are subtypes of AbstractBinning.
The ValueBinning estimator has a linearithmic time complexity (n log(n) for n = length(x)) and a linear space complexity (l for l = dimension(x)). This allows computation of probabilities (histograms) of high-dimensional datasets and with small box sizes ε without memory overflow and with maximum performance. For performance reasons, the probabilities returned never contain 0s and are arbitrarily ordered.
ValueBinning(ϵ::Union{Real,Vector})A convenience method that accepts same input as RectangularBinning and initializes this binning directly.
Outcomes
The outcome space for ValueBinning is the unique bins constructed from b. Each bin is identified by its left (lowest-value) corner, because bins are always left-closed-right-open intervals [a, b). The bins are in data units, not integer (cartesian indices units), and are returned as SVectors, i.e., same type as input data.
For convenience, outcome_space returns the outcomes in the same array format as the underlying binning (e.g., Matrix for 2D input).
For FixedRectangularBinning the outcome_space is well-defined from the binning, but for RectangularBinning input x is needed as well.
Implements
codify. Used for encoding inputs where ordering matters (e.g. time series).
ComplexityMeasures.AbstractBinning — TypeAbstractBinningSupertype encompassing RectangularBinning and FixedRectangularBinning.
ComplexityMeasures.RectangularBinning — TypeRectangularBinning(ϵ, precise = false) <: AbstractBinningRectangular box partition of state space using the scheme ϵ, deducing the histogram extent and bin width from the input data.
RectangularBinning is a convenience struct. It is re-cast into FixedRectangularBinning once the data are provided, so see that docstring for info on the bin calculation and the meaning of precise.
Binning instructions are deduced from the type of ϵ as follows:
ϵ::Intdivides each coordinate axis intoϵequal-length intervals that cover all data.ϵ::Float64divides each coordinate axis into intervals of fixed sizeϵ, starting from the axis minima until the data is completely covered by boxes.ϵ::Vector{Int}divides the i-th coordinate axis intoϵ[i]equal-length intervals that cover all data.ϵ::Vector{Float64}divides the i-th coordinate axis into intervals of fixed sizeϵ[i], starting from the axis minima until the data is completely covered by boxes.
RectangularBinning ensures all input data are covered by extending the created ranges if need be.
ComplexityMeasures.FixedRectangularBinning — TypeFixedRectangularBinning <: AbstractBinning
FixedRectangularBinning(ranges::Tuple{<:AbstractRange...}, precise = false)Rectangular box partition of state space where the partition along each dimension is explicitly given by each range ranges, which is a tuple of AbstractRange subtypes. Typically, each range is the output of the range Base function, e.g., ranges = (0:0.1:1, range(0, 1; length = 101), range(2.1, 3.2; step = 0.33)). All ranges must be sorted.
The optional second argument precise dictates whether Julia Base's TwicePrecision is used for when searching where a point falls into the range. Useful for edge cases of points being almost exactly on the bin edges, but it is exactly four times as slow, so by default it is false.
Points falling outside the partition do not contribute to probabilities. Bins are always left-closed-right-open: [a, b). This means that the last value of each of the ranges dictates the last right-closing value. This value does not belong to the histogram! E.g., if given a range r = range(0, 1; length = 11), with r[end] = 1, the value 1 is outside the partition and would not attribute any increase of the probability corresponding to the last bin (here [0.9, 1))!
Equivalently, the size of the histogram is histsize = map(r -> length(r)-1, ranges)!
FixedRectangularBinning leads to a well-defined outcome space without knowledge of input data, see ValueBinning.
Symbolic permutations
ComplexityMeasures.OrdinalPatterns — TypeOrdinalPatterns <: OutcomeSpace
OrdinalPatterns{m}(τ = 1, lt::Function = ComplexityMeasures.isless_rand)An OutcomeSpace based on lengh-m ordinal permutation patterns, originally introduced in Bandt and Pompe (2002)'s paper on permutation entropy. Note that m is given as a type parameter, so that when it is a literal integer there are performance accelerations.
When passed to probabilities the output depends on the input data type:
- Univariate data. If applied to a univariate timeseries (
AbstractVector), then the timeseries is first embedded using embedding delayτand dimensionm, resulting in embedding vectors $\{ \bf{x}_i \}_{i=1}^{N-(m-1)\tau}$. Then, for each $\bf{x}_i$, we find its permutation pattern $\pi_{i}$. Probabilities are then estimated as the frequencies of the encoded permutation symbols by usingUniqueElements. When giving the resulting probabilities toinformation, the original permutation entropy is computed (Bandt and Pompe, 2002). - Multivariate data. If applied to a an
D-dimensionalStateSpaceSet, then no embedding is constructed,mmust be equal toDandτis ignored. Each vector $\bf{x}_i$ of the dataset is mapped directly to its permutation pattern $\pi_{i}$ by comparing the relative magnitudes of the elements of $\bf{x}_i$. Like above, probabilities are estimated as the frequencies of the permutation symbols. The resulting probabilities can be used to compute multivariate permutation entropy (He et al., 2016), although here we don't perform any further subdivision of the permutation patterns (as in Figure 3 of He et al. (2016)).
Internally, OrdinalPatterns uses the OrdinalPatternEncoding to represent ordinal patterns as integers for efficient computations.
See WeightedOrdinalPatterns and AmplitudeAwareOrdinalPatterns for estimators that not only consider ordinal (sorting) patterns, but also incorporate information about within-state-vector amplitudes. For a version of this estimator that can be used on spatial data, see SpatialOrdinalPatterns.
In Bandt and Pompe (2002), equal values are ordered after their order of appearance, but this can lead to erroneous temporal correlations, especially for data with low amplitude resolution (Zunino et al., 2017). Here, by default, if two values are equal, then one of the is randomly assigned as "the largest", using lt = ComplexityMeasures.isless_rand. To get the behaviour from Bandt and Pompe (2002), use lt = Base.isless.
Outcome space
The outcome space Ω for OrdinalPatterns is the set of length-m ordinal patterns (i.e. permutations) that can be formed by the integers 1, 2, …, m. There are factorial(m) such patterns.
For example, the outcome [2, 3, 1] corresponds to the ordinal pattern of having the smallest value in the second position, the next smallest value in the third position, and the next smallest, i.e. the largest value in the first position. See also OrdinalPatternEncoding.
In-place symbolization
OrdinalPatterns also implements the in-place probabilities! for StateSpaceSet input (or embedded vector input) for reducing allocations in looping scenarios. The length of the pre-allocated symbol vector must be the length of the dataset. For example
using ComplexityMeasures
m, N = 2, 100
est = OrdinalPatterns{m}(τ)
x = StateSpaceSet(rand(N, m)) # some input dataset
πs_ts = zeros(Int, N) # length must match length of `x`
p = probabilities!(πs_ts, est, x)ComplexityMeasures.WeightedOrdinalPatterns — TypeWeightedOrdinalPatterns <: OutcomeSpace
WeightedOrdinalPatterns{m}(τ = 1, lt::Function = ComplexityMeasures.isless_rand)A variant of OrdinalPatterns that also incorporates amplitude information, based on the weighted permutation entropy (Fadlallah et al., 2013). The outcome space and arguments are the same as in OrdinalPatterns.
Description
For each ordinal pattern extracted from each state (or delay) vector, a weight is attached to it which is the variance of the vector. Probabilities are then estimated by summing the weights corresponding to the same pattern, instead of just counting the occurrence of the same pattern.
Note: in equation 7, section III, of the original paper, the authors write
\[w_j = \dfrac{1}{m}\sum_{k=1}^m (x_{j-(k-1)\tau} - \mathbf{\hat{x}}_j^{m, \tau})^2.\]
*But given the formula they give for the arithmetic mean, this is not the variance of the delay vector $\mathbf{x}_i$, because the indices are mixed: $x_{j+(k-1)\tau}$ in the weights formula, vs. $x_{j+(k+1)\tau}$ in the arithmetic mean formula. Here, delay embedding and computation of the patterns and their weights are completely separated processes, ensuring that we compute the arithmetic mean correctly for each vector of the input dataset (which may be a delay-embedded timeseries).
ComplexityMeasures.AmplitudeAwareOrdinalPatterns — TypeAmplitudeAwareOrdinalPatterns <: OutcomeSpace
AmplitudeAwareOrdinalPatterns{m}(τ = 1, A = 0.5, lt = ComplexityMeasures.isless_rand)A variant of OrdinalPatterns that also incorporates amplitude information, based on the amplitude-aware permutation entropy (Azami and Escudero, 2016). The outcome space and arguments are the same as in OrdinalPatterns.
Description
Similarly to WeightedOrdinalPatterns, a weight $w_i$ is attached to each ordinal pattern extracted from each state (or delay) vector $\mathbf{x}_i = (x_1^i, x_2^i, \ldots, x_m^i)$ as
\[w_i = \dfrac{A}{m} \sum_{k=1}^m |x_k^i | + \dfrac{1-A}{d-1} \sum_{k=2}^d |x_{k}^i - x_{k-1}^i|,\]
with $0 \leq A \leq 1$. When $A=0$ , only internal differences between the elements of $\mathbf{x}_i$ are weighted. Only mean amplitude of the state vector elements are weighted when $A=1$. With, $0<A<1$, a combined weighting is used.
Dispersion patterns
ComplexityMeasures.Dispersion — TypeDispersion(; c = 5, m = 2, τ = 1, check_unique = true)An OutcomeSpace based on dispersion patterns, originally used by Rostaghi and Azami (2016) to compute the "dispersion entropy", which characterizes the complexity and irregularity of a time series.
Recommended parameter values (Li et al., 2019) are m ∈ [2, 3], τ = 1 for the embedding, and c ∈ [3, 4, …, 8] categories for the Gaussian symbol mapping.
Description
Assume we have a univariate time series $X = \{x_i\}_{i=1}^N$. First, this time series is encoded into a symbol timeseries $S$ using the Gaussian encoding GaussianCDFEncoding with empirical mean μ and empirical standard deviation σ (both determined from $X$), and c as given to Dispersion.
Then, $S$ is embedded into an $m$-dimensional time series, using an embedding lag of $\tau$, which yields a total of $N - (m - 1)\tau$ delay vectors $z_i$, or "dispersion patterns". Since each element of $z_i$ can take on c different values, and each delay vector has m entries, there are c^m possible dispersion patterns. This number is used for normalization when computing dispersion entropy.
The returned probabilities are simply the frequencies of the unique dispersion patterns present in $S$ (i.e., the UniqueElements of $S$).
Outcome space
The outcome space for Dispersion is the unique delay vectors whose elements are the the symbols (integers) encoded by the Gaussian CDF, i.e., the unique elements of $S$.
Data requirements and parameters
The input must have more than one unique element for the Gaussian mapping to be well-defined. Li et al. (2019) recommends that x has at least 1000 data points.
If check_unique == true (default), then it is checked that the input has more than one unique value. If check_unique == false and the input only has one unique element, then a InexactError is thrown when trying to compute probabilities.
Each embedding vector is called a "dispersion pattern". Why? Let's consider the case when $m = 5$ and $c = 3$, and use some very imprecise terminology for illustration:
When $c = 3$, values clustering far below mean are in one group, values clustered around the mean are in one group, and values clustering far above the mean are in a third group. Then the embedding vector $[2, 2, 2, 2, 2]$ consists of values that are close together (close to the mean), so it represents a set of numbers that are not very spread out (less dispersed). The embedding vector $[1, 1, 2, 3, 3]$, however, represents numbers that are much more spread out (more dispersed), because the categories representing "outliers" both above and below the mean are represented, not only values close to the mean.
For a version of this estimator that can be used on high-dimensional arrays, see SpatialDispersion.
Implements
codify. Used for encoding inputs where ordering matters (e.g. time series).
Transfer operator
ComplexityMeasures.TransferOperator — TypeTransferOperator <: OutcomeSpace
TransferOperator(b::AbstractBinning; warn_precise = true, rng = Random.default_rng())An OutcomeSpace based on binning data into rectangular boxes dictated by the given binning scheme b.
When used with probabilities, then the transfer (Perron-Frobenius) operator is approximated over the bins, then bin probabilities are estimated as the invariant measure associated with that transfer operator. Assumes that the input data are sequential (time-ordered).
This implementation follows the grid estimator approach in Diego et al. (2019).
Precision
The default behaviour when using RectangularBinning or FixedRectangularBinning is to accept some loss of precision on the bin boundaries for speed-ups, but this may lead to issues for TransferOperator where some points may be encoded as the symbol -1 ("outside the binning"). The warn_precise keyword controls whether the user is warned when a less precise binning is used.
Outcome space
The outcome space for TransferOperator is the set of unique bins constructed from b. Bins are identified by their left (lowest-value) corners, are given in data units, and are returned as SVectors.
Bin ordering
Bins returned by probabilities_and_outcomes are ordered according to first appearance (i.e. the first time the input (multivariate) timeseries visits the bin). Thus, if
b = RectangularBinning(4)
est = TransferOperator(b)
probs, outcomes = probabilities_and_outcomes(x, est) # x is some timeseriesthen probs[i] is the invariant measure (probability) of the bin outcomes[i], which is the i-th bin visited by the timeseries with nonzero measure.
Description
The transfer operator $P^{N}$is computed as an N-by-N matrix of transition probabilities between the states defined by the partition elements, where N is the number of boxes in the partition that is visited by the orbit/points.
If $\{x_t^{(D)} \}_{n=1}^L$ are the $L$ different $D$-dimensional points over which the transfer operator is approximated, $\{ C_{k=1}^N \}$ are the $N$ different partition elements (as dictated by ϵ) that gets visited by the points, and $\phi(x_t) = x_{t+1}$, then
\[P_{ij} = \dfrac {\#\{ x_n | \phi(x_n) \in C_j \cap x_n \in C_i \}} {\#\{ x_m | x_m \in C_i \}},\]
where $\#$ denotes the cardinal. The element $P_{ij}$ thus indicates how many points that are initially in box $C_i$ end up in box $C_j$ when the points in $C_i$ are projected one step forward in time. Thus, the row $P_{ik}^N$ where $k \in \{1, 2, \ldots, N \}$ gives the probability of jumping from the state defined by box $C_i$ to any of the other $N$ states. It follows that $\sum_{k=1}^{N} P_{ik} = 1$ for all $i$. Thus, $P^N$ is a row/right stochastic matrix.
Invariant measure estimation from transfer operator
The left invariant distribution $\mathbf{\rho}^N$ is a row vector, where $\mathbf{\rho}^N P^{N} = \mathbf{\rho}^N$. Hence, $\mathbf{\rho}^N$ is a row eigenvector of the transfer matrix $P^{N}$ associated with eigenvalue 1. The distribution $\mathbf{\rho}^N$ approximates the invariant density of the system subject to binning, and can be taken as a probability distribution over the partition elements.
In practice, the invariant measure $\mathbf{\rho}^N$ is computed using invariantmeasure, which also approximates the transfer matrix. The invariant distribution is initialized as a length-N random distribution which is then applied to $P^{N}$. For reproducibility in this step, set the rng. The resulting length-N distribution is then applied to $P^{N}$ again. This process repeats until the difference between the distributions over consecutive iterations is below some threshold.
See also: RectangularBinning, FixedRectangularBinning, invariantmeasure.
Utility methods/types
ComplexityMeasures.InvariantMeasure — TypeInvariantMeasure(to, ρ)Minimal return struct for invariantmeasure that contains the estimated invariant measure ρ, as well as the transfer operator to from which it is computed (including bin information).
See also: invariantmeasure.
ComplexityMeasures.invariantmeasure — Functioninvariantmeasure(x::AbstractStateSpaceSet, binning::RectangularBinning;
rng = Random.default_rng()) → iv::InvariantMeasureEstimate an invariant measure over the points in x based on binning the data into rectangular boxes dictated by the binning, then approximate the transfer (Perron-Frobenius) operator over the bins. From the approximation to the transfer operator, compute an invariant distribution over the bins. Assumes that the input data are sequential.
Details on the estimation procedure is found the TransferOperator docstring.
Example
using DynamicalSystems
henon_rule(x, p, n) = SVector{2}(1.0 - p[1]*x[1]^2 + x[2], p[2]*x[1])
henon = DeterministicIteratedMap(henon_rule, zeros(2), [1.4, 0.3])
orbit, t = trajectory(ds, 20_000; Ttr = 10)
# Estimate the invariant measure over some coarse graining of the orbit.
iv = invariantmeasure(orbit, RectangularBinning(15))
# Get the probabilities and bins
invariantmeasure(iv)Probabilities and bin information
invariantmeasure(iv::InvariantMeasure) → (ρ::Probabilities, bins::Vector{<:SVector})From a pre-computed invariant measure, return the probabilities and associated bins. The element ρ[i] is the probability of visitation to the box bins[i].
Why bother with the transfer operator instead of using regular histograms to obtain probabilities?
In fact, the naive histogram approach and the transfer operator approach are equivalent in the limit of long enough time series (as $n \to \intfy$), which is guaranteed by the ergodic theorem. There is a crucial difference, however:
The naive histogram approach only gives the long-term probabilities that orbits visit a certain region of the state space. The transfer operator encodes that information too, but comes with the added benefit of knowing the transition probabilities between states (see transfermatrix).
See also: InvariantMeasure.
ComplexityMeasures.transfermatrix — Functiontransfermatrix(iv::InvariantMeasure) → (M::AbstractArray{<:Real, 2}, bins::Vector{<:SVector})Return the transfer matrix/operator and corresponding bins. Here, bins[i] corresponds to the i-th row/column of the transfer matrix. Thus, the entry M[i, j] is the probability of jumping from the state defined by bins[i] to the state defined by bins[j].
See also: TransferOperator.
ComplexityMeasures.transferoperator — Functiontransferoperator(pts::StateSpaceSet, binning; kw...)Approximate the transfer operator given a set of sequentially ordered points subject to a rectangular partition given by the binning. The keywords boundary_condition = :none, warn_precise = true are as in TransferOperator.
Kernel density
ComplexityMeasures.NaiveKernel — TypeNaiveKernel(ϵ::Real; method = KDTree, w = 0, metric = Euclidean()) <: OutcomeSpaceAn OutcomeSpace based on a "naive" kernel density estimation approach (KDE), as discussed in Prichard and Theiler (1995).
Probabilities $P(\mathbf{x}, \epsilon)$ are assigned to every point $\mathbf{x}$ by counting how many other points occupy the space spanned by a hypersphere of radius ϵ around $\mathbf{x}$, according to:
\[P_i( X, \epsilon) \approx \dfrac{1}{N} \sum_{s} B(||X_i - X_j|| < \epsilon),\]
where $B$ gives 1 if the argument is true. Probabilities are then normalized.
Keyword arguments
method = KDTree: the search structure supported by Neighborhood.jl. Specifically, useKDTreeto use a tree-based neighbor search, orBruteForcefor the direct distances between all points. KDTrees heavily outperform direct distances when the dimensionality of the data is much smaller than the data length.w = 0: the Theiler window, which excludes indices $s$ that are within $|i - s| ≤ w$ from the given point $x_i$.metric = Euclidean(): the distance metric.
Outcome space
The outcome space Ω for NaiveKernel are the indices of the input data, eachindex(x). Hence, input x is needed for a well-defined outcome_space. The reason to not return the data points themselves is because duplicate data points may not get assigned same probabilities (due to having different neighbors).
Timescales
ComplexityMeasures.WaveletOverlap — TypeWaveletOverlap([wavelet]) <: OutcomeSpaceAn OutcomeSpace based on the maximal overlap discrete wavelet transform (MODWT).
When used with probabilities, the MODWT is applied to a signal, then probabilities are computed as the (normalized) energies at different wavelet scales. These probabilities are used to compute the wavelet entropy according to Rosso et al. (2001). Input timeseries x is needed for a well-defined outcome space.
By default the wavelet Wavelets.WT.Daubechies{12}() is used. Otherwise, you may choose a wavelet from the Wavelets package (it must subtype OrthoWaveletClass).
Outcome space
The outcome space for WaveletOverlap are the integers 1, 2, …, N enumerating the wavelet scales. To obtain a better understanding of what these mean, we prepared a notebook you can view online. As such, this estimator only works for timeseries input and input x is needed for a well-defined outcome_space.
ComplexityMeasures.PowerSpectrum — TypePowerSpectrum() <: OutcomeSpaceAn OutcomeSpace based on the power spectrum of a timeseries (amplitude square of its Fourier transform).
If used with probabilities, then the spectrum normalized to sum = 1 is returned as probabilities. The Shannon entropy of these probabilities is typically referred in the literature as spectral entropy, e.g. Llanos et al. (2017) and Tian et al. (2017).
The closer the spectrum is to flat, i.e., white noise, the higher the entropy. However, you can't compare entropies of timeseries with different length, because the binning in spectral space depends on the length of the input.
Outcome space
The outcome space Ω for PowerSpectrum is the set of frequencies in Fourier space. They should be multiplied with the sampling rate of the signal, which is assumed to be 1. Input x is needed for a well-defined outcome_space.
Cosine similarity binning
ComplexityMeasures.CosineSimilarityBinning — TypeCosineSimilarityBinning(; m::Int, τ::Int, nbins::Int)A OutcomeSpace based on the cosine similarity (Wang et al., 2020).
It can be used with information to compute the "diversity entropy" of an input timeseries (Wang et al., 2020).
The implementation here allows for τ != 1, which was not considered in the original paper.
Description
CosineSimilarityBinning probabilities are computed as follows.
- From the input time series
x, using embedding lagτand embedding dimensionm, construct the embedding $Y = \{\bf x_i \} = \{(x_{i}, x_{i+\tau}, x_{i+2\tau}, \ldots, x_{i+m\tau - 1}\}_{i = 1}^{N-mτ}$. - Compute $D = \{d(\bf x_t, \bf x_{t+1}) \}_{t=1}^{N-mτ-1}$, where $d(\cdot, \cdot)$ is the cosine similarity between two
m-dimensional vectors in the embedding. - Divide the interval
[-1, 1]intonbinsequally sized subintervals (including the value+1). - Construct a histogram of cosine similarities $d \in D$ over those subintervals.
- Sum-normalize the histogram to obtain probabilities.
Outcome space
The outcome space for CosineSimilarityBinning is the bins of the [-1, 1] interval, and the return configuration is the same as in ValueBinning (left bin edge).
Implements
codify. Used for encoding inputs where ordering matters (e.g. time series).
ComplexityMeasures.Diversity — TypeDiversityAn alias to CosineSimilarityBinning.
Sequential pair distances
ComplexityMeasures.SequentialPairDistances — TypeSequentialPairDistances <: CountBasedOutcomeSpace
SequentialPairDistances(x::AbstractVector; n = 3, metric = Chebyshev(), m = 3, τ = 1)
SequentialPairDistances(x::AbstractStateSpaceSet; n = 3, metric = Chebyshev())An outcome space based on the distribution of distances of sequential pairs of points.
This outcome space appears implicitly as part of the "distribution entropy" introduced by Li et al. (2015), which of course can be reproduced here (see example below). We've generalized the method to be used with any InformationMeasure and DiscreteInfoEstimator, and with valid distance metric (from Distances.jl).
Input data x are needed for initialization, because distances must be pre-computed to know the minimum/maximum distances needed for binning the distribution of pairwise distances. If the input is an AbstractVector, then the vector is embedded before computing distances. If the input is an AbstractStateSpaceSet, then the embedding step is skipped and distances are computed directly on each state vector xᵢ ∈ x.
Description
SequentialPairDistances does the following:
- Transforms the input timeseries
xby first embedding it using embedding dimensionmand embedding lagτ(or skip this step if the input is already embedded). - Computes the distances
dsbetween sequential pairs of points according to the givenmetric. - Divides the interval
[minimum(ds), maximum(ds)]intonequal-size bins by usingRectangularBinEncoding, then maps the distances onto these bins.
Outcome space
The outcome space Ω for SequentialPairDistances are the bins onto which the pairwise distances are mapped, encoded as the integers 1:n. If you need the actual bin coordinates, these can be recovered with decode (see example below).
Implements
codify. Note that the inputxis ignored when callingcodify, because the input data is already handled when constructing aSequentialPairDistances.
Examples
The outcome bins can be retrieved as follows.
using ComplexityMeasures
x = rand(100)
o = SequentialPairDistances(x)
cts, outs = counts_and_outcomes(o, x)Computing the "distribution entropy" with n = 3 bins for the distance histogram:
using ComplexityMeasures
x = rand(1000000)
o = SequentialPairDistances(x, n = 3, metric = Chebyshev()) # metric from original paper
h = information(Shannon(base = 2), o, x)Bubble sort swaps
ComplexityMeasures.BubbleSortSwaps — TypeBubbleSortSwaps <: CountBasedOutcomeSpace
BubbleSortSwaps(; m = 3, τ = 1)The BubbleSortSwaps outcome space is based on Manis et al. (2017)'s paper on "bubble entropy".
Description
BubbleSortSwaps does the following:
- Embeds the input data using embedding dimension
mand embedding lagτ - For each state vector in the embedding, counting how many swaps are necessary for the bubble sort algorithm to sort state vectors.
For counts_and_outcomes, we then define a distribution over the number of necessary swaps. This distribution can then be used to estimate probabilities using probabilities_and_outcomes, which again can be used to estimate any InformationMeasure. An example of how to compute the "Shannon bubble entropy" is given below.
Outcome space
The outcome_space for BubbleSortSwaps are the integers 0:N, where N = (m * (m - 1)) / 2 + 1 (the worst-case number of swaps). Hence, the number of total_outcomes is N + 1.
Implements
codify. Returns the number of swaps required for each embedded state vector.
Examples
With the BubbleSortSwaps outcome space, we can easily compute a "bubble entropy" inspired by (Manis et al., 2017). Note: this is not actually a new entropy - it is just a new way of discretizing the input data. To reproduce the bubble entropy complexity measure from (Manis et al., 2017), see BubbleEntropy.
Examples
using ComplexityMeasures
x = rand(100000)
o = BubbleSortSwaps(; m = 5) # 5-dimensional embedding vectors
information(Shannon(; base = 2), o, x)
# We can also compute any other "bubble quantity", for example the
# "Tsallis bubble extropy", with arbitrary probabilities estimators:
information(TsallisExtropy(), BayesianRegularization(), o, x)Spatial outcome spaces
ComplexityMeasures.SpatialOrdinalPatterns — TypeSpatialOrdinalPatterns <: OutcomeSpaceModel
SpatialOrdinalPatterns(stencil, x; periodic = true)A symbolic, permutation-based OutcomeSpace for spatiotemporal systems that generalises OrdinalPatterns to high-dimensional arrays. The order m of the permutation pattern is extracted from the stencil, see below.
SpatialOrdinalPatterns is based on the 2D and 3D spatiotemporal permutation entropy estimators by Ribeiro et al. (2012) and Schlemmer et al. (2018), respectively, but is here implemented as a pure probabilities probabilities estimator that is generalized for D-dimensional input array x, with arbitrary regions (stencils) to get patterns form and (possibly) periodic boundary conditions.
See below for ways to specify the stencil. If periodic = true, then the stencil wraps around at the ends of the array. If false, then collected regions with indices which exceed the array bounds are skipped.
In combination with information and information_normalized, this probabilities estimator can be used to compute generalized spatiotemporal permutation InformationMeasure of any type.
Outcome space
The outcome space Ω for SpatialOrdinalPatterns is the set of length-m ordinal patterns (i.e. permutations) that can be formed by the integers 1, 2, …, m, ordered lexicographically. There are factorial(m) such patterns. Here m refers to the number of points included in stencil.
Stencils
The stencil defines what local area to use to group hypervoxels. Each grouping of hypervoxels is mapped to an order-m permutation pattern, which is then mapped to an integer as in OrdinalPatterns. The stencil is moved around the input array, in a sense "scanning" the input array, to collect all possible groupings allowed by the boundary condition (periodic or not).
Stencils are passed in one of the following three ways:
- As vectors of
CartesianIndexwhich encode the offset of indices to include in the stencil, with respect to the current array index when scanning over the array. For examplestencil = CartesianIndex.([(0,0), (0,1), (1,1), (1,0)]). Don't forget to include the zero offset index if you want to include the hypervoxel itself, which is almost always the case. Here the stencil creates a 2x2 square extending to the bottom and right of the pixel (directions here correspond to the way Julia prints matrices by default). When passing a stencil as a vector ofCartesianIndex,m = length(stencil). - As a
D-dimensional array (whereDmatches the dimensionality of the input data) containing0s and1s, where ifstencil[index] == 1, the corresponding pixel is included, and ifstencil[index] == 0, it is not included. To generate the same estimator as in 1., usestencil = [1 1; 1 1]. When passing a stencil as aD-dimensional array,m = sum(stencil) - As a
Tuplecontaining twoTuples, both of lengthD, forD-dimensional data. The first tuple specifies theextentof the stencil, whereextent[i]dictates the number of hypervoxels to be included along theith axis andlag[i]the separation of hypervoxels along the same axis. This method can only generate (hyper)rectangular stencils. To create the same estimator as in the previous examples, use herestencil = ((2, 2), (1, 1)). When passing a stencil usingextentandlag,m = prod(extent).
ComplexityMeasures.SpatialDispersion — TypeSpatialDispersion <: OutcomeSpace
SpatialDispersion(stencil, x::AbstractArray;
periodic = true,
c = 5,
skip_encoding = false,
L = nothing,
)A dispersion-based OutcomeSpace that generalises Dispersion for input data that are high-dimensional arrays.
SpatialDispersion is based on Azami et al. (2019)'s 2D square dispersion (Shannon) entropy estimator, but is here implemented as a pure probabilities probabilities estimator that is generalized for N-dimensional input data x, with arbitrary neighborhood regions (stencils) and (optionally) periodic boundary conditions.
In combination with information and information_normalized, this probabilities estimator can be used to compute (normalized) generalized spatiotemporal dispersion InformationMeasure of any type.
Arguments
stencil. Defines what local area (hyperrectangle), or which points within this area, to include around each hypervoxel (i.e. pixel in 2D). The examples below demonstrate different ways of specifying stencils. For details, seeSpatialOrdinalPatterns. SeeSpatialOrdinalPatternsfor more information about stencils.x::AbstractArray. The input data. Must be provided because we need to know its size for optimization and bound checking.
Keyword arguments
periodic::Bool. Ifperiodic == true, then the stencil should wrap around at the end of the array. Ifperiodic = false, then pixels whose stencil exceeds the array bounds are skipped.c::Int. Determines how many discrete categories to use for the Gaussian encoding.skip_encoding. Ifskip_encoding == true,encodingis ignored, and dispersion patterns are computed directly fromx, under the assumption thatLis the alphabet length forx(useful for categorical or integer data). Thus, ifskip_encoding == true, thenLmust also be specified. This is useful for categorical or integer-valued data.L. IfL == nothing(default), then the number of total outcomes is inferred fromstencilandencoding. IfLis set to an integer, then the data is considered pre-encoded and the number of total outcomes is set toL.
Outcome space
The outcome space for SpatialDispersion is the unique delay vectors whose elements are the the symbols (integers) encoded by the Gaussian CDF. Hence, the outcome space is all m-dimensional delay vectors whose elements are all possible values in 1:c. There are c^m such vectors.
Description
Estimating probabilities/entropies from higher-dimensional data is conceptually simple.
- Discretize each value (hypervoxel) in
xrelative to all other valuesxᵢ ∈ xusing the providedencodingscheme. - Use
stencilto extract relevant (discretized) points around each hypervoxel. - Construct a symbol these points.
- Take the sum-normalized histogram of the symbol as a probability distribution.
- Optionally, compute
informationorinformation_normalizedfrom this probability distribution.
Usage
Here's how to compute spatial dispersion entropy using the three different ways of specifying stencils.
x = rand(50, 50) # first "time slice" of a spatial system evolution
# Cartesian stencil
stencil_cartesian = CartesianIndex.([(0,0), (1,0), (1,1), (0,1)])
est = SpatialDispersion(stencil_cartesian, x)
information_normalized(est, x)
# Extent/lag stencil
extent = (2, 2); lag = (1, 1); stencil_ext_lag = (extent, lag)
est = SpatialDispersion(stencil_ext_lag, x)
information_normalized(est, x)
# Matrix stencil
stencil_matrix = [1 1; 1 1]
est = SpatialDispersion(stencil_matrix, x)
information_normalized(est, x)To apply this to timeseries of spatial data, simply loop over the call (broadcast), e.g.:
imgs = [rand(50, 50) for i = 1:100]; # one image per second over 100 seconds
stencil = ((2, 2), (1, 1)) # a 2x2 stencil (i.e. dispersion patterns of length 4)
est = SpatialDispersion(stencil, first(imgs))
h_vs_t = information_normalized.(Ref(est), imgs)Computing generalized spatiotemporal dispersion entropy is trivial, e.g. with Renyi:
x = reshape(repeat(1:5, 500) .+ 0.1*rand(500*5), 50, 50)
est = SpatialDispersion(stencil, x)
information(Renyi(q = 2), est, x)See also: SpatialOrdinalPatterns, GaussianCDFEncoding, codify.
ComplexityMeasures.SpatialBubbleSortSwaps — TypeSpatialBubbleSortSwaps <: SpatialOutcomeSpace
SpatialBubbleSortSwaps(stencil, x; periodic = true)SpatialBubbleSortSwaps generalizes BubbleSortSwaps to high-dimensional arrays by encoding pixel/voxel/hypervoxel windows in terms of how many swap operations the bubble sort algorithm requires to sort them.
What does this mean? For BubbleSortSwaps the input data is embedded using embedding dimension m and the number of swaps required are computed for each embedding vector. For SpatialBubbleSortSwaps, the "embedding dimension" m for is inferred from the number of elements in the stencil, and the "embedding vectors" are the hypervoxels selected by the stencil.
Outcome space
The outcome space Ω for SpatialBubbleSortSwaps is the range of integers 0:(n*(n-1)÷2), corresponding to the number of swaps required by the bubble sort algorithm to sort a particular pixel/voxel/hypervoxel window.
Arguments
stencil. Defines what local area (hyperrectangle), or which points within this area, to include around each hypervoxel (i.e. pixel in 2D). SeeSpatialOrdinalPatternsandSpatialDispersionfor more information about stencils and examples of how to specify them.x::AbstractArray. The input data. Must be provided because we need to know its size for optimization and bound checking.
Keyword arguments
periodic::Bool. Ifperiodic == true, then the stencil should wrap around at the end of the array. Ifperiodic = false, then pixels whose stencil exceeds the array bounds are skipped.
Example
using ComplexityMeasures
using Random; rng = MersenneTwister(1234)
x = rand(rng, 100, 100, 100) # some 3D image
stencil = zeros(Int,2,2,2) # 3D stencil
stencil[:, :, 1] = [1 0; 1 1]
stencil[:, :, 2] = [0 1; 1 0]
o = SpatialBubbleSortSwaps(stencil, x)
# Distribution of "bubble sorting complexity" among voxel windows
counts_and_outcomes(o, x)
# "Spatial bubble Kaniadakis entropy", with shrinkage-adjusted probabilities
information(Kaniadakis(), Shrinkage(), o, x)Probabilities and related functions
ComplexityMeasures.Probabilities — TypeProbabilities <: Array{<:AbstractFloat, N}
Probabilities(probs::Array [, outcomes [, dimlabels]]) → p
Probabilities(counts::Counts [, outcomes [, dimlabels]]) → pProbabilities stores an N-dimensional array of probabilities, while ensuring that the array sums to 1 (normalized probability mass). In most cases the array is a standard vector. p itself can be manipulated and iterated over, just like its stored array.
The probabilities correspond to outcomes that describe the axes of the array. If p isa Probabilities, then p.outcomes[i] is an an abstract vector containing the outcomes along the i-th dimension. The outcomes have the same ordering as the probabilities, so that p[i][j] is the probability for outcome p.outcomes[i][j]. The dimensions of the array are named, and can be accessed by p.dimlabels, where p.dimlabels[i] is the label of the i-th dimension. Both outcomes and dimlabels are assigned automatically if not given. If the input is a set of Counts, and outcomes and dimlabels are not given, then the labels and outcomes are inherited from the counts.
Examples
julia> probs = [0.2, 0.2, 0.2, 0.2]; Probabilities(probs) # will be normalized to sum to 1
Probabilities{Float64,1} over 4 outcomes
Outcome(1) 0.25
Outcome(2) 0.25
Outcome(3) 0.25
Outcome(4) 0.25julia> c = Counts([12, 16, 12], ["out1", "out2", "out3"]); Probabilities(c)
Probabilities{Float64,1} over 3 outcomes
"out1" 0.3
"out2" 0.4
"out3" 0.3ComplexityMeasures.probabilities — Functionprobabilities(
[est::ProbabilitiesEstimator], o::OutcomeSpace, x::Array_or_SSSet
) → p::ProbabilitiesCompute the same probabilities as in the probabilities_and_outcomes function, with two differences:
- Do not explicitly return the outcomes.
- If the outcomes are not estimated for free while estimating the counts, a special integer type is used to enumerate the outcomes, to avoid the computational cost of estimating the outcomes.
probabilities([est::ProbabilitiesEstimator], counts::Counts) → (p::Probabilities, Ω)The same as above, but estimate the probability directly from a set of Counts.
ComplexityMeasures.probabilities_and_outcomes — Functionprobabilities_and_outcomes(
[est::ProbabilitiesEstimator], o::OutcomeSpace, x::Array_or_SSSet
) → (p::Probabilities, Ω)Estimate a probability distribution over the set of possible outcomes Ω defined by the OutcomeSpace o, given input data x. Probabilities are estimated according to the given probabilities estimator est, which defaults to RelativeAmount.
The input data is typically an Array or a StateSpaceSet (or SSSet for short); see Input data for ComplexityMeasures.jl. Configuration options are always given as arguments to the chosen outcome space and probabilities estimator.
Return a tuple where the first element is a Probabilities instance, which is vector-like and contains the probabilities, and where the second element Ω are the outcomes corresponding to the probabilities, such that p[i] is the probability for the outcome Ω[i].
The outcomes are actually included in p, and you can use the outcomes function on the p to get them. probabilities_and_outcomes returns both for backwards compatibility.
probabilities_and_outcomes(
[est::ProbabilitiesEstimator], counts::Counts
) → (p::Probabilities, Ω)Estimate probabilities from the pre-computed counts using the given ProbabilitiesEstimator est.
Description
Probabilities are computed by:
- Discretizing/encoding
xinto a finite set of outcomesΩspecified by the providedOutcomeSpaceo. - Assigning to each outcome
Ωᵢ ∈ Ωeither a count (how often it appears among the discretized data points), or a pseudo-count (some pre-normalized probability such thatsum(Ωᵢ for Ωᵢ in Ω) == 1).
For outcome spaces that result in pseudo counts, such as PowerSpectrum, these pseudo counts are simply treated as probabilities and returned directly (that is, est is ignored). For counting-based outcome spaces (see OutcomeSpace docstring), probabilities are estimated from the counts using some ProbabilitiesEstimator (first signature).
Observed vs all probabilities
Due to performance optimizations, whether the returned probabilities contain 0s as entries or not depends on the outcome space. E.g., in ValueBinning 0s are skipped, while in PowerSpectrum 0 are not skipped, because we get them for free.
Use allprobabilities_and_outcomes to guarantee that zero probabilities are also returned (may be slower).
ComplexityMeasures.allprobabilities_and_outcomes — Functionallprobabilities_and_outcomes(est::ProbabilitiesEstimator, x::Array_or_SSSet) → (p::Probabilities, outs)
allprobabilities_and_outcomes(o::OutcomeSpace, x::Array_or_SSSet) → (p::Probabilities, outs)The same as probabilities_and_outcomes, but ensures that outcomes with 0 probability are explicitly added in the returned vector. This means that p[i] is the probability of ospace[i], with ospace =outcome_space(est, x).
This function is useful in cases where one wants to compare the probability mass functions of two different input data x, y under the same estimator. E.g., to compute the KL-divergence of the two PMFs assumes that the obey the same indexing. This is not true for probabilities even with the same est, due to the skipping of 0 entries, but it is true for allprobabilities_and_outcomes.
ComplexityMeasures.probabilities! — Functionprobabilities!(s, args...)Similar to probabilities(args...), but allows pre-allocation of temporarily used containers s.
Only works for certain estimators. See for example OrdinalPatterns.
ComplexityMeasures.missing_probabilities — Functionmissing_probabilities([est::ProbabilitiesEstimator], o::OutcomeSpace, x)Same as missing_outcomes, but defines a "missing outcome" as an outcome having 0 probability according to est.
Counts
ComplexityMeasures.Counts — TypeCounts <: Array{<:Integer, N}
Counts(counts [, outcomes [, dimlabels]]) → cCounts stores an N-dimensional array of integer counts corresponding to a set of outcomes. This is typically called a "frequency table" or "contingency table".
If c isa Counts, then c.outcomes[i] is an abstract vector containing the outcomes along the i-th dimension, where c[i][j] is the count corresponding to the outcome c.outcomes[i][j], and c.dimlabels[i] is the label of the i-th dimension. Both labels and outcomes are assigned automatically if not given. c itself can be manipulated and iterated over like its stored array.
ComplexityMeasures.counts_and_outcomes — Functioncounts_and_outcomes(o::OutcomeSpace, x) → (cts::Counts, Ω)Discretize/encode x (which must be sortable) into a finite set of outcomes Ω specified by the provided OutcomeSpace o, and then count how often each outcome Ωᵢ ∈ Ω (i.e. each "discretized value", or "encoded symbol") appears.
Return a tuple where the first element is a Counts instance, which is vector-like and contains the counts, and where the second element Ω are the outcomes corresponding to the counts, such that cts[i] is the count for the outcome Ω[i].
The outcomes are actually included in cts, and you can use the outcomes function on the cts to get them. counts_and_outcomes returns both for backwards compatibility.
counts_and_outcomes(x) → cts::CountsIf no OutcomeSpace is specified, then UniqueElements is used as the outcome space.
Description
For OutcomeSpaces that uses encode to discretize, it is possible to count how often each outcome $\omega_i \in \Omega$, where $\Omega$ is the set of possible outcomes, is observed in the discretized/encoded input data. Thus, we can assign to each outcome $\omega_i$ a count $f(\omega_i)$, such that $\sum_{i=1}^N f(\omega_i) = N$, where $N$ is the number of observations in the (encoded) input data. counts returns the counts $f(\omega_i)_{obs}$ and outcomes only for the observed outcomes $\omega_i^{obs}$ (those outcomes that actually appear in the input data). If you need the counts for unobserved outcomes as well, use allcounts_and_outcomes.
ComplexityMeasures.counts — Functioncounts(o::OutcomeSpace, x) → cts::CountsCompute the same counts as in the counts_and_outcomes function, with two differences:
- Do not explicitly return the outcomes.
- If the outcomes are not estimated for free while estimating the counts, a special integer type is used to enumerate the outcomes, to avoid the computational cost of estimating the outcomes.
ComplexityMeasures.allcounts_and_outcomes — Functionallcounts_and_outcomes(o::OutcomeSpace, x::Array_or_SSSet) → (cts::Counts{<:Integer, 1}, Ω)Like counts_and_outcomes, but ensures that all outcomes Ωᵢ ∈ Ω, where Ω = outcome_space(o, x)), are included.
Outcomes that do not occur in the data x get a 0 count.
ComplexityMeasures.is_counting_based — Functionis_counting_based(o::OutcomeSpace)Return true if the OutcomeSpace o is counting-based, and false otherwise.
Probability estimators
ComplexityMeasures.ProbabilitiesEstimator — TypeProbabilitiesEstimatorThe supertype for all probabilities estimators.
The role of the probabilities estimator is to convert (pseudo-)counts to probabilities. Currently, the implementation of all probabilities estimators assume finite outcome space with known cardinality. Therefore, ProbabilitiesEstimator accept an OutcomeSpace as the first argument, which specifies the set of possible outcomes.
Probabilities estimators are used with probabilities and allprobabilities_and_outcomes.
Implementations
The default probabilities estimator is RelativeAmount, which is compatible with any OutcomeSpace. The following estimators only support counting-based outcomes.
Description
In ComplexityMeasures.jl, probability mass functions are estimated from data by defining a set of possible outcomes $\Omega = \{\omega_1, \omega_2, \ldots, \omega_L \}$ (by specifying an OutcomeSpace), and assigning to each outcome $\omega_i$ a probability $p(\omega_i)$, such that $\sum_{i=1}^N p(\omega_i) = 1$ (by specifying a ProbabilitiesEstimator).
ComplexityMeasures.RelativeAmount — TypeRelativeAmount <: ProbabilitiesEstimator
RelativeAmount()The RelativeAmount estimator is used with probabilities and related functions to estimate probabilities over the given OutcomeSpace using maximum likelihood estimation (MLE), also called plug-in estimation. See ProbabilitiesEstimator for usage.
Description
Consider a length-m outcome space $\Omega$ and random sample of length N. The maximum likelihood estimate of the probability of the k-th outcome $\omega_k$ is
\[p(\omega_k) = \dfrac{n_k}{N},\]
where $n_k$ is the number of times the k-th outcome was observed in the (encoded) sample.
This estimation is known as maximum likelihood estimation. However, RelativeAmount also serves as the fall-back probabilities estimator for OutcomeSpaces that are not count-based and only yield "pseudo-counts", for example WaveletOverlap or PowerSpectrum. These outcome spaces do not yield counts, but pre-normalized numbers that can be treated as "relative frequencies" or "relative power". Hence, this estimator is called RelativeAmount.
Examples
using ComplexityMeasures
x = cumsum(randn(100))
ps = probabilities(OrdinalPatterns{3}(), x) # `RelativeAmount` is the default estimator
ps_mle = probabilities(RelativeAmount(), OrdinalPatterns{3}(), x) # equivalent
ps == ps_mle # trueSee also: BayesianRegularization, Shrinkage.
ComplexityMeasures.BayesianRegularization — TypeBayesianRegularization <: ProbabilitiesEstimator
BayesianRegularization(; a = 1.0)The BayesianRegularization estimator is used with probabilities and related functions to estimate probabilities an m-element counting-based OutcomeSpace using Bayesian regularization of cell counts (Hausser and Strimmer, 2009). See ProbabilitiesEstimator for usage.
Outcome space requirements
This estimator only works with counting-compatible outcome spaces.
Description
The BayesianRegularization estimator estimates the probability of the $k$-th outcome $\omega_{k}$ is
\[\omega_{k}^{\text{BayesianRegularization}} = \dfrac{n_k + a_k}{n + A},\]
where $n$ is the number of samples in the input data, $n_k$ is the observed counts for the outcome $\omega_{k}$, and $A = \sum_{i=1}^k a_k$.
Picking a
There are many common choices of priors, some of which are listed in Hausser and Strimmer (2009). They include
a == 0, which is equivalent to theRelativeAmountestimator.a == 0.5(Jeffrey's prior)a == 1(Bayes-Laplace uniform prior)
a can also be chosen as a vector of real numbers. Then, if used with allprobabilities_and_outcomes, it is required that length(a) == total_outcomes(o, x), where x is the input data and o is the OutcomeSpace. If used with probabilities, then length(a) must match the number of observed outcomes (you can check this using probabilities_and_outcomes). The choice of a can severely impact the estimation errors of the probabilities, and the errors depend both on the choice of a and on the sampling scenario (Hausser and Strimmer, 2009).
Assumptions
The BayesianRegularization estimator assumes a fixed and known m. Thus, using it with probabilities_and_outcomes and allprobabilities_and_outcomes will yield different results, depending on whether all outcomes are observed in the input data or not. For probabilities_and_outcomes, m is the number of observed outcomes. For allprobabilities_and_outcomes, m = total_outcomes(o, x), where o is the OutcomeSpace and x is the input data.
If used with allprobabilities_and_outcomes, then outcomes which have not been observed may be assigned non-zero probabilities. This might affect your results if using e.g. missing_outcomes.
Examples
using ComplexityMeasures
x = cumsum(randn(100))
ps_bayes = probabilities(BayesianRegularization(a = 0.5), OrdinalPatterns{3}(), x)See also: RelativeAmount, Shrinkage.
ComplexityMeasures.Shrinkage — TypeShrinkage{<:OutcomeSpace} <: ProbabilitiesEstimator
Shrinkage(; t = nothing, λ = nothing)The Shrinkage estimator is used with probabilities and related functions to estimate probabilities over the given m-element counting-based OutcomeSpace using James-Stein-type shrinkage (James and Stein, 1992), as presented in Hausser and Strimmer (2009).
Description
The Shrinkage estimator estimates a cell probability $\theta_{k}^{\text{Shrink}}$ as
\[\theta_{k}^{\text{Shrink}} = \lambda t_k + (1-\lambda) \hat{\theta}_k^{RelativeAmount},\]
where $\lambda \in [0, 1]$ is the shrinkage intensity ($\lambda = 0$ means no shrinkage, and $\lambda = 1$ means full shrinkage), and $t_k$ is the shrinkage target. Hausser and Strimmer (2009) picks $t_k = 1/m$, i.e. the uniform distribution.
If t == nothing, then $t_k$ is set to $1/m$ for all $k$, as in Hausser and Strimmer (2009). If λ == nothing (the default), then the shrinkage intensity is optimized according to Hausser and Strimmer (2009). Hence, you should probably not pick λ nor t manually, unless you know what you are doing.
Assumptions
The Shrinkage estimator assumes a fixed and known number of outcomes m. Thus, using it with probabilities_and_outcomes) and allprobabilities_and_outcomes will yield different results, depending on whether all outcomes are observed in the input data or not. For probabilities_and_outcomes, m is the number of observed outcomes. For allprobabilities_and_outcomes, m = total_outcomes(o, x), where o is the OutcomeSpace and x is the input data.
If used with allprobabilities_and_outcomes, then outcomes which have not been observed may be assigned non-zero probabilities. This might affect your results if using e.g. missing_outcomes.
Examples
using ComplexityMeasures
x = cumsum(randn(100))
ps_shrink = probabilities(Shrinkage(), OrdinalPatterns{3}(), x)See also: RelativeAmount, BayesianRegularization.
ComplexityMeasures.AddConstant — TypeAddConstant <: ProbabilitiesEstimator
AddConstant(; c = 1.0)A generic add-constant probabilities estimator for counting-based OutcomeSpaces, where several literature estimators can be obtained tuning c. Currently $c$ can only be a scalar.
c = 1.0is the Laplace estimator, or the "add-one" estimator.
Description
Probabilities for the $k$-th outcome $\omega_{k}$ are estimated as
\[p(\omega_k) = \dfrac{(n_k + c)}{n + mc},\]
where $m$ is the cardinality of the outcome space, and $n$ is the number of (encoded) input data points, and $n_k$ is the number of times the outcome $\omega_{k}$ is observed in the (encoded) input data points.
If the AddConstant estimator used with probabilities_and_outcomes, then $m$ is set to the number of observed outcomes. If used with allprobabilities_and_outcomes, then $m$ is set to the number of possible outcomes.
Looking at the formula above, if $n_k = 0$, then unobserved outcomes are assigned a non-zero probability of $\dfrac{c}{n + mc}$. This means that if the estimator is used with allprobabilities_and_outcomes, then all outcomes, even those that are not observed, are assigned non-zero probabilities. This might affect your results if using e.g. missing_outcomes.
Encodings/Symbolizations API
Count-based OutcomeSpaces first "encode" input data into an intermediate representation indexed by the positive integers. This intermediate representation is called an "encoding". Alternative names for "encode" in the literature is "symbolize" or "codify", and in this package we use the latter.
The encodings API is defined by:
ComplexityMeasures.Encoding — TypeEncodingThe supertype for all encoding schemes. Encodings always encode elements of input data into the positive integers. The encoding API is defined by the functions encode and decode. Some probability estimators utilize encodings internally.
Current available encodings are:
OrdinalPatternEncoding.GaussianCDFEncoding.RectangularBinEncoding.RelativeMeanEncoding.RelativeFirstDifferenceEncoding.UniqueElementsEncoding.BubbleSortSwapsEncoding.PairDistanceEncoding.CombinationEncoding, which can combine any of the above encodings.
ComplexityMeasures.encode — Functionencode(c::Encoding, χ) -> i::IntEncode an element χ ∈ x of input data x (those given to e.g., counts) into the positive integers using encoding c. The special value of i = -1 is used as a return value for inappropriate elements χ that cannot be encoded according to c.
ComplexityMeasures.decode — Functiondecode(c::Encoding, i::Integer) -> ωDecode an encoded element i into the outcome ω ∈ Ω it corresponds to. Ω is the outcome_space that uses encoding c.
ComplexityMeasures.codify — Functioncodify(o::OutcomeSpace, x::Vector) → s::Vector{Int}Codify x according to the outcome space o.
Description
The reason this function exists is that we don't always want to encode the entire input x at once. Sometimes, it is desirable to first apply some transformation to x first, then apply Encodings in a point-wise manner in the transformed space. (the OutcomeSpace dictates this transformation). This is useful for encoding timeseries data.
The length of the returned s depends on the OutcomeSpace. Some outcome spaces preserve the input data length (e.g. UniqueElements), while some outcome spaces (e.g. OrdinalPatterns) do e.g. delay embeddings before encoding, so that length(s) < length(x).
Available encodings
ComplexityMeasures.OrdinalPatternEncoding — TypeOrdinalPatternEncoding <: Encoding
OrdinalPatternEncoding{m}(lt = ComplexityMeasures.isless_rand)An encoding scheme that encodes length-m vectors into their permutation/ordinal patterns and then into the integers based on the Lehmer code. It is used by OrdinalPatterns and similar estimators, see that for a description of the outcome space.
The ordinal/permutation pattern of a vector χ is simply sortperm(χ), which gives the indices that would sort χ in ascending order.
Description
The Lehmer code, as implemented here, is a bijection between the set of factorial(m) possible permutations for a length-m sequence, and the integers 1, 2, …, factorial(m). The encoding step uses algorithm 1 in Berger et al. (2019), which is highly optimized. The decoding step is much slower due to missing optimizations (pull requests welcomed!).
Example
julia> using ComplexityMeasures
julia> χ = [4.0, 1.0, 9.0];
julia> c = OrdinalPatternEncoding(3);
julia> i = encode(c, χ)
3
julia> decode(c, i)
3-element SVector{3, Int64} with indices SOneTo(3):
2
1
3If you want to encode something that is already a permutation pattern, then you can use the non-exported permutation_to_integer function.
ComplexityMeasures.GaussianCDFEncoding — TypeGaussianCDFEncoding <: Encoding
GaussianCDFEncoding{m}(; μ, σ, c::Int = 3)An encoding scheme that encodes a scalar or vector χ into one of the integers sᵢ ∈ [1, 2, …, c] based on the normal cumulative distribution function (NCDF), and decodes the sᵢ into subintervals of [0, 1] (with some loss of information).
Initializing a GaussianCDFEncoding
The size of the input to be encoded must be known beforehand. One must therefore set m = length(χ), where χ is the input (m = 1 for scalars, m ≥ 2 for vectors). To do so, one must explicitly give m as a type parameter: e.g. encoding = GaussianCDFEncoding{3}(; μ = 0.0, σ = 0.1) to encode 3-element vectors, or encoding = GaussianCDFEncoding{1}(; μ = 0.0, σ = 0.1) to encode scalars.
Description
Encoding/decoding scalars
GaussianCDFEncoding first maps an input scalar $χ$ to a new real number $y_ \in [0, 1]$ by using the normal cumulative distribution function (CDF) with the given mean μ and standard deviation σ, according to the map
\[x \to y : y = \dfrac{1}{ \sigma \sqrt{2 \pi}} \int_{-\infty}^{x} e^{(-(x - \mu)^2)/(2 \sigma^2)} dx.\]
Next, the interval [0, 1] is equidistantly binned and enumerated $1, 2, \ldots, c$, and $y$ is linearly mapped to one of these integers using the linear map $y \to z : z = \text{floor}(y(c-1)) + 1$.
Because of the floor operation, some information is lost, so when used with decode, each decoded sᵢ is mapped to a subinterval of [0, 1]. This subinterval is returned as a length-1 Vector{SVector}.
Notice that the decoding step does not yield an element of any outcome space of the estimators that use GaussianCDFEncoding internally, such as Dispersion. That is because these estimators additionally delay embed the encoded data.
Encoding/decoding vectors
If GaussianCDFEncoding is used with a vector χ, then each element of χ is encoded separately, resulting in a length(χ) sequence of integers which may be treated as a CartesianIndex. The encoded symbol s ∈ [1, 2, …, c] is then just the linear index corresponding to this cartesian index (similar to how CombinationEncoding works).
When decoded, the integer symbol s is converted back into its CartesianIndex representation, which is just a sequence of integers that refer to subdivisions of the [0, 1] interval. The relevant subintervals are then returned as a length-χ Vector{SVector}.
Examples
julia> using ComplexityMeasures, Statistics
julia> x = [0.1, 0.4, 0.7, -2.1, 8.0];
julia> μ, σ = mean(x), std(x); encoding = GaussianCDFEncoding(; μ, σ, c = 5)
julia> es = encode.(Ref(encoding), x)
5-element Vector{Int64}:
2
2
3
1
5
julia> decode(encoding, 3)
2-element SVector{2, Float64} with indices SOneTo(2):
0.4
0.6ComplexityMeasures.RectangularBinEncoding — TypeRectangularBinEncoding <: Encoding
RectangularBinEncoding(binning::RectangularBinning, x)
RectangularBinEncoding(binning::FixedRectangularBinning)An encoding scheme that encodes points χ ∈ x into their histogram bins.
The first call signature simply initializes a FixedRectangularBinning and then calls the second call signature.
See FixedRectangularBinning for info on mapping points to bins.
ComplexityMeasures.RelativeMeanEncoding — TypeRelativeMeanEncoding <: Encoding
RelativeMeanEncoding(minval::Real, maxval::Real; n = 2)RelativeMeanEncoding encodes a vector based on the relative position the mean of the vector has with respect to a predefined minimum and maximum value (minval and maxval, respectively).
Description
This encoding is inspired by Azami and Escudero (2016)'s algorithm for amplitude-aware permutation entropy. They use a linear combination of amplitude information and first differences information of state vectors to correct probabilities. Here, however, we explicitly encode the amplitude-part of the correction as an a integer symbol Λ ∈ [1, 2, …, n]. The first-difference part of the encoding is available as the RelativeFirstDifferenceEncoding encoding.
Encoding/decoding
When used with encode, an $m$-element state vector $\bf{x} = (x_1, x_2, \ldots, x_m)$ is encoded as $Λ = \dfrac{1}{N}\sum_{i=1}^m abs(x_i)$. The value of $Λ$ is then normalized to lie on the interval [0, 1], assuming that the minimum/maximum value any single element $x_i$ can take is minval/maxval, respectively. Finally, the interval [0, 1] is discretized into n discrete bins, enumerated by positive integers 1, 2, …, n, and the number of the bin that the normalized $Λ$ falls into is returned.
When used with decode, the left-edge of the bin that the normalized $Λ$ fell into is returned.
ComplexityMeasures.RelativeFirstDifferenceEncoding — TypeRelativeFirstDifferenceEncoding <: Encoding
RelativeFirstDifferenceEncoding(minval::Real, maxval::Real; n = 2)RelativeFirstDifferenceEncoding encodes a vector based on the relative position the average of the first differences of the vectors has with respect to a predefined minimum and maximum value (minval and maxval, respectively).
Description
This encoding is inspired by Azami and Escudero (2016)'s algorithm for amplitude-aware permutation entropy. They use a linear combination of amplitude information and first differences information of state vectors to correct probabilities. Here, however, we explicitly encode the first differences part of the correction as an a integer symbol Λ ∈ [1, 2, …, n]. The amplitude part of the encoding is available as the RelativeMeanEncoding encoding.
Encoding/decoding
When used with encode, an $m$-element state vector $\bf{x} = (x_1, x_2, \ldots, x_m)$ is encoded as $Λ = \dfrac{1}{m - 1}\sum_{k=2}^m |x_{k} - x_{k-1}|$. The value of $Λ$ is then normalized to lie on the interval [0, 1], assuming that the minimum/maximum value any single $abs(x_k - x_{k-1})$ can take is minval/maxval, respectively. Finally, the interval [0, 1] is discretized into n discrete bins, enumerated by positive integers 1, 2, …, n, and the number of the bin that the normalized $Λ$ falls into is returned. The smaller the mean first difference of the state vector is, the smaller the bin number is. The higher the mean first difference of the state vectors is, the higher the bin number is.
When used with decode, the left-edge of the bin that the normalized $Λ$ fell into is returned.
Performance tips
If you are encoding multiple input vectors, it is more efficient to construct a RelativeFirstDifferenceEncoding instance and re-use it:
minval, maxval = 0, 1
encoding = RelativeFirstDifferenceEncoding(minval, maxval; n = 4)
pts = [rand(3) for i = 1:1000]
[encode(encoding, x) for x in pts]ComplexityMeasures.UniqueElementsEncoding — TypeUniqueElementsEncoding <: Encoding
UniqueElementsEncoding(x)UniqueElementsEncoding is a generic encoding that encodes each xᵢ ∈ unique(x) to one of the positive integers. The xᵢ are encoded according to the order of their first appearance in the input data.
The constructor requires the input data x, since the number of possible symbols is length(unique(x)).
Example
using ComplexityMeasures
x = ['a', 2, 5, 2, 5, 'a']
e = UniqueElementsEncoding(x)
encode.(Ref(e), x) == [1, 2, 3, 2, 3, 1] # trueComplexityMeasures.PairDistanceEncoding — TypePairDistanceEncoding <: Encoding
PairDistanceEncoding(min_dist, max_dist; n = 2, metric = Chebyshev(), precise = false)An encoding that encodes point pairs of the form Tuple{<:AbstractVector, <:AbstractVector} by first computing their distance using the given metric, then dividing the interval [min_dist, max_dist] into n equal-size bins, and mapping the computed distance onto one of those bins. Bins are enumerated as 1:n. When decode-ing the bin integer, the left edge of the bin is returned.
precise has the same meaning as in RectangularBinEncoding.
Example
Let's create an example where the minimum and maximum allowed distance is known.
using ComplexityMeasures, Distances, StaticArrays
m = Chebyshev()
y = [SVector(1.0), SVector(0.5), SVector(0.25), SVector(0.64)]
pair1, pair2, pair3 = (y[1], y[2]), (y[2], y[3]), (y[3], y[4])
dmax = m(pair1...) # dist = 0.50
dmin = m(pair2...) # dist = 0.25
dmid = m(pair3...) # dist = 0.39
# This should give five bins with left adges at [0.25], [0.30], [0.35], [0.40] and [0.45]
encoding = PairDistanceEncoding(dmin, dmax; n = 5, metric = m)
c1 = encode(encoding, pair1) # 5
c2 = encode(encoding, pair2) # 1
c3 = encode(encoding, pair3) # 3
decode(encoding, c3) ≈ [0.35] # trueComplexityMeasures.BubbleSortSwapsEncoding — TypeBubbleSortSwapsEncoding <: Encoding
BubbleSortSwapsEncoding{m}()BubbleSortSwapsEncoding is used with encode to encode a length-m input vector x into an integer in the range ω ∈ 0:((m*(m-1)) ÷ 2), by counting the number of swaps required for the bubble sort algorithm to sort x in ascending order.
decode is not implemented for this encoding.
Example
using ComplexityMeasures
x = [1, 5, 3, 1, 2]
e = BubbleSortSwapsEncoding{5}() # constructor type argument must match length of vector
encode(e, x)ComplexityMeasures.CombinationEncoding — TypeCombinationEncoding <: Encoding
CombinationEncoding(encodings)A CombinationEncoding takes multiple Encodings and creates a combined encoding that can be used to encode inputs that are compatible with the given encodings.
Encoding/decoding
When used with encode, each Encoding in encodings returns integers in the set 1, 2, …, n_e, where n_e is the total number of outcomes for a particular encoding. For k different encodings, we can thus construct the cartesian coordinate (c₁, c₂, …, cₖ) (cᵢ ∈ 1, 2, …, n_i), which can uniquely be identified by an integer. We can thus identify each unique combined encoding with a single integer.
When used with decode, the integer symbol is converted to its corresponding cartesian coordinate, which is used to retrieve the decoded symbols for each of the encodings, and a tuple of the decoded symbols are returned.
The total number of outcomes is prod(total_outcomes(e) for e in encodings).
Examples
using ComplexityMeasures
# We want to encode the vector `x`.
x = [0.9, 0.2, 0.3]
# To do so, we will use a combination of first-difference encoding, amplitude encoding,
# and ordinal pattern encoding.
encodings = (
RelativeFirstDifferenceEncoding(0, 1; n = 2),
RelativeMeanEncoding(0, 1; n = 5),
OrdinalPatternEncoding(3) # x is a three-element vector
)
c = CombinationEncoding(encodings)
# Encode `x` as integer
ω = encode(c, x)
# Decode symbol (into a vector of decodings, one for each encodings `e ∈ encodings`).
# In this particular case, the first two element will be left-bin edges, and
# the last element will be the decoded ordinal pattern (indices that would sort `x`).
d = decode(c, ω)