Probabilities

Note

Be sure you have gone through the Tutorial before going through the API here to have a good idea of the terminology used in ComplexityMeasures.jl.

ComplexityMeasures.jl implements an interface for probabilities that exactly follows the mathematically rigorous formulation of probability spaces. Probability spaces are formalized by an OutcomeSpace $\Omega$. Probabilities are extracted from data then by referencing an outcome space in the functions counts and probabilities. The mathematical formulation of probabilities spaces is further enhanced by ProbabilitiesEstimator and its subtypes, which may correct theoretically known biases when estimating probabilities from finite data.

In reality, probabilities can be either discrete (mass functions) or continuous (density functions). Currently in ComplexityMeasures.jl, only probability mass functions (i.e., countable $\Omega$) are implemented explicitly. Quantities that are estimated from probability density functions (i.e., uncountable $\Omega$) also exist and are implemented in ComplexityMeasures.jl. However, these are estimated by a one-step processes without the intermediate estimation of probabilities.

If $\Omega$ is countable, the process of estimating the outcomes from input data is also called discretization of the input data.

Outcome spaces

ComplexityMeasures.OutcomeSpace — Type

OutcomeSpace

The supertype for all outcome space implementation.

Description

In ComplexityMeasures.jl, an outcome space defines a set of possible outcomes $\Omega = \{\omega_1, \omega_2, \ldots, \omega_L \}$ (some form of discretization). In the literature, the outcome space is often also called an "alphabet", while each outcome is called a "symbol" or an "event".

An outcome space also defines a set of rules for mapping input data to to each outcome $\omega_i$, a processes called encoding or symbolizing or discretizing in the literature (see encodings). Some OutcomeSpaces first apply a transformation, e.g. a delay embedding, to the data before discretizing/encoding, while other OutcomeSpaces discretize/encode the data directly.

Implementations

Outcome space	Principle	Input data	Counting-compatible
`UniqueElements`	Count of unique elements	`Any`	✔
`ValueBinning`	Binning (histogram)	`Vector`, `StateSpaceSet`	✔
`OrdinalPatterns`	Ordinal patterns	`Vector`, `StateSpaceSet`	✔
`SpatialOrdinalPatterns`	Ordinal patterns in space	`Array`	✔
`Dispersion`	Dispersion patterns	`Vector`	✔
`SpatialDispersion`	Dispersion patterns in space	`Array`	✔
`CosineSimilarityBinning`	Cosine similarity	`Vector`	✔
`BubbleSortSwaps`	Swap counts when sorting	`Vector`	✔
`SequentialPairDistances`	Sequential state vector distances	`Vector`, `StateSpaceSet`	✔
`TransferOperator`	Binning (transfer operator)	`Vector`, `StateSpaceSet`	✖
`NaiveKernel`	Kernel density estimation	`StateSpaceSet`	✖
`WeightedOrdinalPatterns`	Ordinal patterns	`Vector`, `StateSpaceSet`	✖
`AmplitudeAwareOrdinalPatterns`	Ordinal patterns	`Vector`, `StateSpaceSet`	✖
`WaveletOverlap`	Wavelet transform	`Vector`	✖
`PowerSpectrum`	Fourier transform	`Vector`	✖

In the column "input data" it is assumed that the eltype of the input is <: Real.

Usage

Outcome spaces are used as input to

probabilities/allprobabilities_and_outcomes for computing probability mass functions.
outcome_space, which returns the elements of the outcome space.
total_outcomes, which returns the cardinality of the outcome space.
counts/counts_and_outcomes/allcounts_and_outcomes, for obtaining raw counts instead of probabilities (only for counting-compatible outcome spaces).

Counting-compatible vs. non-counting compatible outcome spaces

There are two main types of outcome spaces.

Counting-compatible outcome spaces have a well-defined way of counting how often each point in the (encoded) input data is mapped to a particular outcome $\omega_i$. These outcome spaces use encode to discretize the input data. Examples are OrdinalPatterns (which encodes input data into ordinal patterns) or ValueBinning (which discretizes points onto a regular grid). The table below lists which outcome spaces are counting compatible.
Non-counting compatible outcome spaces have no well-defined way of counting explicitly how often each point in the input data is mapped to a particular outcome $\omega_i$. Instead, these outcome spaces returns a vector of pre-normalized "relative counts", one for each outcome $\omega_i$. Examples are WaveletOverlap or PowerSpectrum.

Counting-compatible outcome spaces can be used with any ProbabilitiesEstimator to convert counts into probability mass functions. Non-counting-compatible outcome spaces can only be used with the maximum likelihood (RelativeAmount) probabilities estimator, which estimates probabilities precisely by the relative frequency of each outcome (formally speaking, the RelativeAmount estimator also requires counts, but for the sake of code consistency, we allow it to be used with relative frequencies as well).

The function is_counting_based can be used to check whether an outcome space is based on counting.

Deducing the outcome space (from data)

Some outcome space models can deduce $\Omega$ without knowledge of the input, such as OrdinalPatterns. Other outcome spaces require knowledge of the input data for concretely specifying $\Omega$, such as ValueBinning with RectangularBinning. If o is some outcome space model and x some input data, then outcome_space(o, x) returns the possible outcomes $\Omega$. To get the cardinality of $\Omega$, use total_outcomes.

Implementation details

The element type of $\Omega$ varies between outcome space models, but it is guaranteed to be hashable and sortable. This allows for conveniently tracking the counts of a specific event across experimental realizations, by using the outcome as a dictionary key and the counts as the value for that key (or, alternatively, the key remains the outcome and one has a vector of probabilities, one for each experimental realization).