Generalized entropy
For probability distributions
Generalized entropy is a property of probability distributions.
Entropies.genentropy
— MethodGeneralized entropy of a probability distribution
genentropy(α::Real, p::AbstractArray; base = Base.MathConstants.e)
Compute the entropy, to the given base
, of an array of probabilities p
(assuming that p
is sum-normalized).
If a multivariate Dataset
x
is given, then the a sum-normalized histogram is obtained directly on the elements of x
, and the generalized entropy is computed on that distribution.
Description
Let $p$ be an array of probabilities (summing to 1). Then the Rényi entropy is
and generalizes other known entropies, like e.g. the information entropy ($\alpha = 1$, see [Shannon1948]), the maximum entropy ($\alpha=0$, also known as Hartley entropy), or the correlation entropy ($\alpha = 2$, also known as collision entropy).
Example
using Entropies
p = rand(5000)
p = p ./ sum(p) # normalizing to 1 ensures we have a probability distribution
# Estimate order-1 generalized entropy to base 2 of the distribution
Entropies.genentropy(1, ps, base = 2)
See also: non0hist
.
For real data (ordered sequences, time series)
The method above only works when you actually have access to a probability distribution. In most cases, probability distributions have to be estimated from data.
Currently, we implement the following probability estimators:
CountOccurrences
VisitationFrequency
SymbolicPermutation
SymbolicWeightedPermutation
SymbolicAmplitudeAwarePermutation
Getting the distributions
Distributions can be obtained directly for dataset x
using the signature
probabilities(x, estimator)
Computing the entropy
The syntax for using the different estimators to compute generalized entropy are as follows.
Entropies.genentropy
— MethodEntropy based on counting occurrences of distinct elements
genentropy(x::AbstractDataset, est::CountOccurrences, α = 1; base = Base.MathConstants.e)
genentropy(x::AbstractVector{T}, est::CountOccurrences, α = 1; base = Base.MathConstants.e) where T
Compute the order-α
generalized (Rényi) entropy[Rényi1960] of a dataset x
by counting repeated elements in x
. Then, obtain a sum-normalized histogram from the counts of repeated elements, and compute generalized entropy. Assumes that x
can be sorted.
Example
using Entropies, DelayEmbeddings
# A dataset with many identical state vectors
D = Dataset(rand(1:3, 5000, 3))
# Estimate order-1 generalized entropy to base 2 of the dataset
Entropies.genentropy(D, CountOccurrences(), 1, base = 2)
using Entropies, DelayEmbeddings
# A bunch of tuples, many potentially identical
x = [(rand(1:5), rand(1:5), rand(1:5)) for i = 1:10000]
# Default generalized entropy of the tuples
Entropies.genentropy(x, CountOccurrences())
See also: CountOccurrences
.
Permutation entropy
genentropy(x::AbstractDataset, est::SymbolicPermutation, α::Real = 1; base = 2) → Real
genentropy(x::AbstractVector{<:Real}, est::SymbolicPermutation, α::Real = 1; m::Int = 3, τ::Int = 1, base = 2) → Real
genentropy!(s::Vector{Int}, x::AbstractDataset, est::SymbolicPermutation, α::Real = 1; base = 2) → Real
genentropy!(s::Vector{Int}, x::AbstractVector{<:Real}, est::SymbolicPermutation, α::Real = 1; m::Int = 3, τ::Int = 1, base = 2) → Real
Compute the generalized order-α
entropy over a permutation symbolization of x
, using symbol size/order m
.
If x
is a multivariate Dataset
, then symbolization is performed directly on the state vectors. If x
is a univariate signal, then a delay reconstruction with embedding lag τ
and embedding dimension m
is used to construct state vectors, on which symbolization is then performed.
A pre-allocated symbol array s
can be provided to save some memory allocations if probabilities are to be computed for multiple data sets. If provided, it is required that length(x) == length(s)
if x
is a Dataset
, or length(s) == length(x) - (m-1)τ
if x
is a univariate signal.
Probability and entropy estimation
An unordered symbol frequency histogram is obtained by symbolizing the points in x
, using probabilities(::AbstractDataset, ::SymbolicPermutation)
. Sum-normalizing this histogram yields a probability distribution over the symbols.
After the symbolization histogram/distribution has been obtained, the order α
generalized entropy[Rényi1960], to the given base
, is computed from that sum-normalized symbol distribution, using genentropy
.
Do not confuse the order of the generalized entropy (α
) with the order m
of the permutation entropy (m
, which controls the symbol size). Permutation entropy is usually estimated with α = 1
, but the implementation here allows the generalized entropy of any dimension to be computed from the symbol frequency distribution.
See also: SymbolicPermutation
, genentropy
.
Weighted permutation entropy
genentropy(x::AbstractDataset, est::SymbolicWeightedPermutation, α::Real = 1; base = 2) → Real
genentropy(x::AbstractVector{<:Real}, est::SymbolicWeightedPermutation, α::Real = 1; m::Int = 3, τ::Int = 1, base = 2) → Real
Compute the generalized order α
entropy based on a weighted permutation symbolization of x
, using symbol size/order m
for the permutations.
If x
is a multivariate Dataset
, then symbolization is performed directly on the state vectors. If x
is a univariate signal, then a delay reconstruction with embedding lag τ
and embedding dimension m
is used to construct state vectors, on which symbolization is then performed.
Probability and entropy estimation
An unordered symbol frequency histogram is obtained by symbolizing the points in x
by a weighted procedure, using probabilities(::AbstractDataset, ::SymbolicWeightedPermutation)
. Sum-normalizing this histogram yields a probability distribution over the weighted symbols.
After the symbolization histogram/distribution has been obtained, the order α
generalized entropy[Rényi1960], to the given base
, is computed from that sum-normalized symbol distribution, using genentropy
.
Do not confuse the order of the generalized entropy (α
) with the order m
of the permutation entropy (m
, which controls the symbol size). Permutation entropy is usually estimated with α = 1
, but the implementation here allows the generalized entropy of any dimension to be computed from the symbol frequency distribution.
See also: SymbolicWeightedPermutation
, genentropy
.
Amplitude-aware permutation entropy
genentropy(x::AbstractDataset, est::SymbolicAmplitudeAwarePermutation, α::Real = 1; base = 2) → Real
genentropy(x::AbstractVector{<:Real}, est::SymbolicAmplitudeAwarePermutation, α::Real = 1;
m::Int = 3, τ::Int = 1, base = 2) → Real
Compute the generalized order α
entropy based on an amplitude-sensitive permutation symbolization of x
, using symbol size/order m
for the permutations.
If x
is a multivariate Dataset
, then symbolization is performed directly on the state vectors. If x
is a univariate signal, then a delay reconstruction with embedding lag τ
and embedding dimension m
is used to construct state vectors, on which symbolization is then performed.
Probability and entropy estimation
An unordered symbol frequency histogram is obtained by symbolizing the points in x
by an amplitude-aware procedure, using probabilities(::AbstractDataset, ::SymbolicAmplitudeAwarePermutation)
. Sum-normalizing this histogram yields a probability distribution over the amplitude-encoding symbols.
After the symbolization histogram/distribution has been obtained, the order α
generalized entropy[Rényi1960], to the given base
, is computed from that sum-normalized symbol distribution, using genentropy
.
Do not confuse the order of the generalized entropy (α
) with the order m
of the permutation entropy (m
, which controls the symbol size). Permutation entropy is usually estimated with α = 1
, but the implementation here allows the generalized entropy of any dimension to be computed from the symbol frequency distribution.
See also: SymbolicAmplitudeAwarePermutation
, genentropy
.
- Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
- Shannon1948C. E. Shannon, Bell Systems Technical Journal 27, pp 379 (1948)
- Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
- Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
- Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)