Entropies & Probabilities
Here we discuss obtaining probabilities and entropies from a given dataset (that typically represents a trajectory or set in the state space of a dynamical system). The data are expected in the form discussed in Numerical Data.
The main API for this is contained in two functions:
probabilities
which computes probability distributions of given datasetsgenentropy
which uses the output ofprobabilities
, or a set of pre-computedProbabilities
, to calculate entropies.
These functions dispatch on subtypes of ProbabilitiesEstimator
, which are summarized in the Probabilities Estimators page.
Probabilities
Entropies.Probabilities
— TypeProbabilities(x) → p
A simple wrapper type around an x::AbstractVector
which ensures that p
sums to 1. Behaves identically to Vector
.
Entropies.probabilities
— Functionprobabilities(x::Vector_or_Dataset, est::ProbabilitiesEstimator) → p::Probabilities
Calculate probabilities representing x
based on the provided estimator and return them as a Probabilities
container (Vector
-like). The probabilities are typically unordered and may or may not contain 0s, see the documentation of the individual estimators for more.
The configuration options are always given as arguments to the chosen estimator.
probabilities(x::Vector_or_Dataset, ε::AbstractFloat) → p::Probabilities
Convenience syntax which provides probabilities for x
based on rectangular binning (i.e. performing a histogram). In short, the state space is divided into boxes of length ε
, and formally we use est = VisitationFrequency(RectangularBinning(ε))
as an estimator, see VisitationFrequency
.
This method has a linearithmic time complexity (n log(n)
for n = length(x)
) and a linear space complexity (l
for l = dimension(x)
). This allows computation of probabilities (histograms) of high-dimensional datasets and with small box sizes ε
without memory overflow and with maximum performance. To obtain the bin information along with p
, use binhist
.
probabilities(x::Vector_or_Dataset, n::Integer) → p::Probabilities
Same as the above method, but now each dimension of the data is binned into n::Int
equal sized bins instead of bins of length ε::AbstractFloat
.
probabilities(x::Vector_or_Dataset) → p::Probabilities
Directly count probabilities from the elements of x
without any discretization, binning, or other processing (mostly useful when x
contains categorical or integer data).
Entropies.probabilities!
— Functionprobabilities!(args...)
Identical to probabilities(args...)
, but allows pre-allocation of temporarily used containers.
Only works for certain estimators. See for example SymbolicPermutation
.
Fast histograms
Entropies.binhist
— Functionbinhist(x::AbstractDataset, ε::Real) → p, bins
binhist(x::AbstractDataset, ε::RectangularBinning) → p, bins
Hyper-optimized histogram calculation for x
with rectangular binning ε
. Returns the probabilities p
of each bin of the histogram as well as the bins. Notice that bins
are the starting corners of each bin. If ε isa Real
, then the actual bin size is ε
across each dimension. If ε isa RectangularBinning
, then the bin size for each dimension will depend on the binning scheme.
See also: RectangularBinning
.
Generalized entropy
In the study of dynamical systems there are many quantities that identify as "entropy". Notice that these quantities are not the thermodynamic ones, used in Statistical Physics. Rather, they are more like the to the entropies of information theory.
All of the entropy-related quantities boil down to one thing: first extracting probabilities from a dataset and then applying the generalized entropy formula using genentropy
.
Entropies.genentropy
— Functiongenentropy(p::Probabilities; q = 1.0, base = MathConstants.e)
Compute the generalized order-q
entropy of some probabilities returned by the probabilities
function. Alternatively, compute entropy from pre-computed Probabilities
.
genentropy(x::Vector_or_Dataset, est; q = 1.0, base)
A convenience syntax, which calls first probabilities(x, est)
and then calculates the entropy of the result (and thus est
can be a ProbabilitiesEstimator
or simply ε::Real
).
Description
Let $p$ be an array of probabilities (summing to 1). Then the generalized (Rényi) entropy is
\[H_q(p) = \frac{1}{1-q} \log \left(\sum_i p[i]^q\right)\]
and generalizes other known entropies, like e.g. the information entropy ($q = 1$, see [Shannon1948]), the maximum entropy ($q=0$, also known as Hartley entropy), or the correlation entropy ($q = 2$, also known as collision entropy).
ChaosTools.permentropy
— Functionpermentropy(x, m = 3; τ = 1, base = Base.MathConstants.e)
Compute the permutation entropy[Brandt2002] of given order m
from the x
timeseries.
This method is equivalent with
genentropy(x, SymbolicPermutation(; m, τ); base)
- Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
- Shannon1948C. E. Shannon, Bell Systems Technical Journal 27, pp 379 (1948)
- Bandt2002C. Bandt, & B. Pompe, Phys. Rev. Lett. 88 (17), pp 174102 (2002)