Generalized entropy

For probability distributions

Generalized entropy is a property of probability distributions.

Entropies.genentropyMethod

Generalized entropy of a probability distribution

genentropy(α::Real, p::AbstractArray; base = Base.MathConstants.e)

Compute the entropy, to the given base, of an array of probabilities p (assuming that p is sum-normalized).

If a multivariate Dataset x is given, then the a sum-normalized histogram is obtained directly on the elements of x, and the generalized entropy is computed on that distribution.

Description

Let $p$ be an array of probabilities (summing to 1). Then the Rényi entropy is

\[H_\alpha(p) = \frac{1}{1-\alpha} \log \left(\sum_i p[i]^\alpha\right)\]

and generalizes other known entropies, like e.g. the information entropy ($\alpha = 1$, see [Shannon1948]), the maximum entropy ($\alpha=0$, also known as Hartley entropy), or the correlation entropy ($\alpha = 2$, also known as collision entropy).

Example

using Entropies
p = rand(5000)
p = p ./ sum(p) # normalizing to 1 ensures we have a probability distribution

# Estimate order-1 generalized entropy to base 2 of the distribution
Entropies.genentropy(1, ps, base = 2)

See also: non0hist.

source

For real data (ordered sequences, time series)

The method above only works when you actually have access to a probability distribution. In most cases, probability distributions have to be estimated from data.

Currently, we implement the following probability estimators:

Getting the distributions

Distributions can be obtained directly for dataset x using the signature

probabilities(x, estimator)

Computing the entropy

The syntax for using the different estimators to compute generalized entropy are as follows.

Entropies.genentropyMethod

Entropy based on counting occurrences of distinct elements

genentropy(x::AbstractDataset, est::CountOccurrences, α = 1; base = Base.MathConstants.e)
genentropy(x::AbstractVector{T}, est::CountOccurrences, α = 1; base = Base.MathConstants.e) where T

Compute the order-α generalized (Rényi) entropy[Rényi1960] of a dataset x by counting repeated elements in x. Then, obtain a sum-normalized histogram from the counts of repeated elements, and compute generalized entropy. Assumes that x can be sorted.

Example

using Entropies, DelayEmbeddings

# A dataset with many identical state vectors
D = Dataset(rand(1:3, 5000, 3))

# Estimate order-1 generalized entropy to base 2 of the dataset
Entropies.genentropy(D, CountOccurrences(), 1, base = 2)
using Entropies, DelayEmbeddings

# A bunch of tuples, many potentially identical
x = [(rand(1:5), rand(1:5), rand(1:5)) for i = 1:10000]

# Default generalized entropy of the tuples
Entropies.genentropy(x, CountOccurrences())

See also: CountOccurrences.

source

Permutation entropy

genentropy(x::AbstractDataset, est::SymbolicPermutation, α::Real = 1; base = 2) → Real
genentropy(x::AbstractVector{<:Real}, est::SymbolicPermutation, α::Real = 1; m::Int = 3, τ::Int = 1, base = 2) → Real

genentropy!(s::Vector{Int}, x::AbstractDataset, est::SymbolicPermutation, α::Real = 1; base = 2) → Real
genentropy!(s::Vector{Int}, x::AbstractVector{<:Real}, est::SymbolicPermutation, α::Real = 1; m::Int = 3, τ::Int = 1, base = 2) → Real

Compute the generalized order-α entropy over a permutation symbolization of x, using symbol size/order m.

If x is a multivariate Dataset, then symbolization is performed directly on the state vectors. If x is a univariate signal, then a delay reconstruction with embedding lag τ and embedding dimension m is used to construct state vectors, on which symbolization is then performed.

A pre-allocated symbol array s can be provided to save some memory allocations if probabilities are to be computed for multiple data sets. If provided, it is required that length(x) == length(s) if x is a Dataset, or length(s) == length(x) - (m-1)τ if x is a univariate signal.

Probability and entropy estimation

An unordered symbol frequency histogram is obtained by symbolizing the points in x, using probabilities(::AbstractDataset, ::SymbolicPermutation). Sum-normalizing this histogram yields a probability distribution over the symbols.

After the symbolization histogram/distribution has been obtained, the order α generalized entropy[Rényi1960], to the given base, is computed from that sum-normalized symbol distribution, using genentropy.

Generalized entropy order vs. permutation order

Do not confuse the order of the generalized entropy (α) with the order m of the permutation entropy (m, which controls the symbol size). Permutation entropy is usually estimated with α = 1, but the implementation here allows the generalized entropy of any dimension to be computed from the symbol frequency distribution.

See also: SymbolicPermutation, genentropy.

source

Weighted permutation entropy

genentropy(x::AbstractDataset, est::SymbolicWeightedPermutation, α::Real = 1; base = 2) → Real
genentropy(x::AbstractVector{<:Real}, est::SymbolicWeightedPermutation, α::Real = 1; m::Int = 3, τ::Int = 1, base = 2) → Real

Compute the generalized order α entropy based on a weighted permutation symbolization of x, using symbol size/order m for the permutations.

If x is a multivariate Dataset, then symbolization is performed directly on the state vectors. If x is a univariate signal, then a delay reconstruction with embedding lag τ and embedding dimension m is used to construct state vectors, on which symbolization is then performed.

Probability and entropy estimation

An unordered symbol frequency histogram is obtained by symbolizing the points in x by a weighted procedure, using probabilities(::AbstractDataset, ::SymbolicWeightedPermutation). Sum-normalizing this histogram yields a probability distribution over the weighted symbols.

After the symbolization histogram/distribution has been obtained, the order α generalized entropy[Rényi1960], to the given base, is computed from that sum-normalized symbol distribution, using genentropy.

Generalized entropy order vs. permutation order

Do not confuse the order of the generalized entropy (α) with the order m of the permutation entropy (m, which controls the symbol size). Permutation entropy is usually estimated with α = 1, but the implementation here allows the generalized entropy of any dimension to be computed from the symbol frequency distribution.

See also: SymbolicWeightedPermutation, genentropy.

source

Amplitude-aware permutation entropy

genentropy(x::AbstractDataset, est::SymbolicAmplitudeAwarePermutation, α::Real = 1; base = 2) → Real
genentropy(x::AbstractVector{<:Real}, est::SymbolicAmplitudeAwarePermutation, α::Real = 1; 
    m::Int = 3, τ::Int = 1, base = 2) → Real

Compute the generalized order α entropy based on an amplitude-sensitive permutation symbolization of x, using symbol size/order m for the permutations.

If x is a multivariate Dataset, then symbolization is performed directly on the state vectors. If x is a univariate signal, then a delay reconstruction with embedding lag τ and embedding dimension m is used to construct state vectors, on which symbolization is then performed.

Probability and entropy estimation

An unordered symbol frequency histogram is obtained by symbolizing the points in x by an amplitude-aware procedure, using probabilities(::AbstractDataset, ::SymbolicAmplitudeAwarePermutation). Sum-normalizing this histogram yields a probability distribution over the amplitude-encoding symbols.

After the symbolization histogram/distribution has been obtained, the order α generalized entropy[Rényi1960], to the given base, is computed from that sum-normalized symbol distribution, using genentropy.

Generalized entropy order vs. permutation order

Do not confuse the order of the generalized entropy (α) with the order m of the permutation entropy (m, which controls the symbol size). Permutation entropy is usually estimated with α = 1, but the implementation here allows the generalized entropy of any dimension to be computed from the symbol frequency distribution.

See also: SymbolicAmplitudeAwarePermutation, genentropy.

source
  • Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
  • Shannon1948C. E. Shannon, Bell Systems Technical Journal 27, pp 379 (1948)
  • Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
  • Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)
  • Rényi1960A. Rényi, Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability, pp 547 (1960)