Generalized entropy
For probability distributions
Generalized entropy is a property of probability distributions.
Entropies.genentropy — MethodGeneralized entropy of a probability distribution
genentropy(α::Real, p::AbstractArray; base = Base.MathConstants.e)Compute the entropy, to the given base, of an array of probabilities p (assuming  that p is sum-normalized).
If a multivariate Dataset x is given, then the a sum-normalized histogram is obtained  directly on the elements of x, and the generalized entropy is computed on that  distribution.
Description
Let $p$ be an array of probabilities (summing to 1). Then the Rényi entropy is
and generalizes other known entropies, like e.g. the information entropy ($\alpha = 1$, see [Shannon1948]), the maximum entropy ($\alpha=0$, also known as Hartley entropy), or the correlation entropy ($\alpha = 2$, also known as collision entropy).
Example
using Entropies
p = rand(5000)
p = p ./ sum(p) # normalizing to 1 ensures we have a probability distribution
# Estimate order-1 generalized entropy to base 2 of the distribution
Entropies.genentropy(1, ps, base = 2)See also: non0hist.
For real data (ordered sequences, time series)
The method above only works when you actually have access to a probability distribution. In most cases, probability distributions have to be estimated from data.
Currently, we implement the following probability estimators:
- CountOccurrences
- VisitationFrequency
- SymbolicPermutation
- SymbolicWeightedPermutation
- SymbolicAmplitudeAwarePermutation
Getting the distributions
Distributions can be obtained directly for dataset x using the signature
probabilities(x, estimator)Computing the entropy
The syntax for using the different estimators to compute generalized entropy are as follows.
Entropies.genentropy — MethodEntropy based on counting occurrences of distinct elements
genentropy(x::AbstractDataset, est::CountOccurrences, α = 1; base = Base.MathConstants.e)
genentropy(x::AbstractVector{T}, est::CountOccurrences, α = 1; base = Base.MathConstants.e) where TCompute the order-α generalized (Rényi) entropy[Rényi1960] of a dataset x by counting repeated elements in x. Then, obtain a sum-normalized histogram from the  counts of repeated elements, and compute generalized entropy. Assumes that x can be sorted.
Example
using Entropies, DelayEmbeddings
# A dataset with many identical state vectors
D = Dataset(rand(1:3, 5000, 3))
# Estimate order-1 generalized entropy to base 2 of the dataset
Entropies.genentropy(D, CountOccurrences(), 1, base = 2)using Entropies, DelayEmbeddings
# A bunch of tuples, many potentially identical
x = [(rand(1:5), rand(1:5), rand(1:5)) for i = 1:10000]
# Default generalized entropy of the tuples
Entropies.genentropy(x, CountOccurrences())See also: CountOccurrences.
Permutation entropy
genentropy(x::AbstractDataset, est::SymbolicPermutation, α::Real = 1; base = 2) → Real
genentropy(x::AbstractVector{<:Real}, est::SymbolicPermutation, α::Real = 1; m::Int = 3, τ::Int = 1, base = 2) → Real
genentropy!(s::Vector{Int}, x::AbstractDataset, est::SymbolicPermutation, α::Real = 1; base = 2) → Real
genentropy!(s::Vector{Int}, x::AbstractVector{<:Real}, est::SymbolicPermutation, α::Real = 1; m::Int = 3, τ::Int = 1, base = 2) → RealCompute the generalized order-α entropy over a permutation symbolization of x, using  symbol size/order m. 
If x is a multivariate Dataset, then symbolization is performed directly on the state  vectors. If x is a univariate signal, then a delay reconstruction with embedding lag τ  and embedding dimension m is used to construct state vectors, on which symbolization is  then performed.
A pre-allocated symbol array s can be provided to save some memory allocations if   probabilities are to be computed for multiple data sets. If provided, it is required that  length(x) == length(s) if x is a Dataset, or  length(s) == length(x) - (m-1)τ  if x is a univariate signal.
Probability and entropy estimation
An unordered symbol frequency histogram is obtained by symbolizing the points in x, using probabilities(::AbstractDataset, ::SymbolicPermutation). Sum-normalizing this histogram yields a probability distribution over the symbols.
After the symbolization histogram/distribution has been obtained, the order α generalized  entropy[Rényi1960], to the given base, is computed from that sum-normalized symbol  distribution, using genentropy.
Do not confuse the order of the generalized entropy (α) with the order m of the  permutation entropy (m, which controls the symbol size). Permutation entropy is usually  estimated with α = 1, but the implementation here allows the generalized entropy of any  dimension to be computed from the symbol frequency distribution.
See also: SymbolicPermutation, genentropy.
Weighted permutation entropy
genentropy(x::AbstractDataset, est::SymbolicWeightedPermutation, α::Real = 1; base = 2) → Real
genentropy(x::AbstractVector{<:Real}, est::SymbolicWeightedPermutation, α::Real = 1; m::Int = 3, τ::Int = 1, base = 2) → RealCompute the generalized order α entropy based on a weighted permutation  symbolization of x, using symbol size/order m for the permutations.
If x is a multivariate Dataset, then symbolization is performed directly on the state  vectors. If x is a univariate signal, then a delay reconstruction with embedding lag τ  and embedding dimension m is used to construct state vectors, on which symbolization is  then performed.
Probability and entropy estimation
An unordered symbol frequency histogram is obtained by symbolizing the points in x by a weighted procedure, using probabilities(::AbstractDataset, ::SymbolicWeightedPermutation). Sum-normalizing this histogram yields a probability distribution over the weighted symbols.
After the symbolization histogram/distribution has been obtained, the order α generalized  entropy[Rényi1960], to the given base, is computed from that sum-normalized symbol  distribution, using genentropy.
Do not confuse the order of the generalized entropy (α) with the order m of the  permutation entropy (m, which controls the symbol size). Permutation entropy is usually  estimated with α = 1, but the implementation here allows the generalized entropy of any  dimension to be computed from the symbol frequency distribution.
See also: SymbolicWeightedPermutation, genentropy.
Amplitude-aware permutation entropy
genentropy(x::AbstractDataset, est::SymbolicAmplitudeAwarePermutation, α::Real = 1; base = 2) → Real
genentropy(x::AbstractVector{<:Real}, est::SymbolicAmplitudeAwarePermutation, α::Real = 1; 
    m::Int = 3, τ::Int = 1, base = 2) → RealCompute the generalized order α entropy based on an amplitude-sensitive permutation  symbolization of x, using symbol size/order m for the permutations.
If x is a multivariate Dataset, then symbolization is performed directly on the state  vectors. If x is a univariate signal, then a delay reconstruction with embedding lag τ  and embedding dimension m is used to construct state vectors, on which symbolization is  then performed.
Probability and entropy estimation
An unordered symbol frequency histogram is obtained by symbolizing the points in x by an amplitude-aware procedure, using  probabilities(::AbstractDataset, ::SymbolicAmplitudeAwarePermutation). Sum-normalizing this histogram yields a probability distribution over the amplitude-encoding  symbols.
After the symbolization histogram/distribution has been obtained, the order α generalized  entropy[Rényi1960], to the given base, is computed from that sum-normalized symbol  distribution, using genentropy.
Do not confuse the order of the generalized entropy (α) with the order m of the  permutation entropy (m, which controls the symbol size). Permutation entropy is usually  estimated with α = 1, but the implementation here allows the generalized entropy of any  dimension to be computed from the symbol frequency distribution.
See also: SymbolicAmplitudeAwarePermutation, genentropy.
- Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
- Shannon1948C. E. Shannon, Bell Systems Technical Journal 27, pp 379 (1948)
- Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
- Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
- Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)