# Generalized entropy

The following two functions are used for probability and entropy estimation:

`probabilities`

which computes probability distributions of given datasets`genentropy`

which uses the output of`probabilities`

, or a set of pre-computed`Probabilities`

, to calculate entropies.

See the Entropies.jl documentation for details.

`Entropies.genentropy`

— Function`genentropy(p::Probabilities; q = 1.0, base = MathConstants.e)`

Compute the generalized order-`q`

entropy of some probabilities returned by the `probabilities`

function. Alternatively, compute entropy from pre-computed `Probabilities`

.

`genentropy(x::Vector_or_Dataset, est; q = 1.0, base)`

A convenience syntax, which calls first `probabilities(x, est)`

and then calculates the entropy of the result (and thus `est`

can be a `ProbabilitiesEstimator`

or simply `ε::Real`

).

**Description**

Let $p$ be an array of probabilities (summing to 1). Then the generalized (Rényi) entropy is

\[H_q(p) = \frac{1}{1-q} \log \left(\sum_i p[i]^q\right)\]

and generalizes other known entropies, like e.g. the information entropy ($q = 1$, see ^{[Shannon1948]}), the maximum entropy ($q=0$, also known as Hartley entropy), or the correlation entropy ($q = 2$, also known as collision entropy).

`Entropies.Probabilities`

— Type`Probabilities(x) → p`

A simple wrapper type around an `x::AbstractVector`

which ensures that `p`

sums to 1. Behaves identically to `Vector`

.

`Entropies.probabilities`

— Function`probabilities(x::Vector_or_Dataset, est::ProbabilitiesEstimator) → p::Probabilities`

Calculate probabilities representing `x`

based on the provided estimator and return them as a `Probabilities`

container (`Vector`

-like). The probabilities are typically unordered and may or may not contain 0s, see the documentation of the individual estimators for more.

The configuration options are always given as arguments to the chosen estimator.

`probabilities(x::Vector_or_Dataset, ε::AbstractFloat) → p::Probabilities`

Convenience syntax which provides probabilities for `x`

based on rectangular binning (i.e. performing a histogram). In short, the state space is divided into boxes of length `ε`

, and formally we use `est = VisitationFrequency(RectangularBinning(ε))`

as an estimator, see `VisitationFrequency`

.

This method has a linearithmic time complexity (`n log(n)`

for `n = length(x)`

) and a linear space complexity (`l`

for `l = dimension(x)`

). This allows computation of probabilities (histograms) of high-dimensional datasets and with small box sizes `ε`

without memory overflow and with maximum performance. To obtain the bin information along with `p`

, use `binhist`

.

`probabilities(x::Vector_or_Dataset, n::Integer) → p::Probabilities`

Same as the above method, but now each dimension of the data is binned into `n::Int`

equal sized bins instead of bins of length `ε::AbstractFloat`

.

`probabilities(x::Vector_or_Dataset) → p::Probabilities`

Directly count probabilities from the elements of `x`

without any discretization, binning, or other processing (mostly useful when `x`

contains categorical or integer data).

`Entropies.probabilities!`

— Function`probabilities!(args...)`

Identical to `probabilities(args...)`

, but allows pre-allocation of temporarily used containers.

Only works for certain estimators. See for example `SymbolicPermutation`

.

`Entropies.ProbabilitiesEstimator`

— TypeAn abstract type for probabilities estimators.

`Entropies.binhist`

— Function```
binhist(x::AbstractDataset, ε::Real) → p, bins
binhist(x::AbstractDataset, ε::RectangularBinning) → p, bins
```

Hyper-optimized histogram calculation for `x`

with rectangular binning `ε`

. Returns the probabilities `p`

of each bin of the histogram as well as the bins. Notice that `bins`

are the starting corners of each bin. If `ε isa Real`

, then the actual bin size is `ε`

across each dimension. If `ε isa RectangularBinning`

, then the bin size for each dimension will depend on the binning scheme.

See also: `RectangularBinning`

.

- Rényi1960A. Rényi,
*Proceedings of the fourth Berkeley Symposium on Mathematics, Statistics and Probability*, pp 547 (1960) - Shannon1948C. E. Shannon, Bell Systems Technical Journal
**27**, pp 379 (1948)