Probabilities · Entropies.jl

Entropies.probabilities — Method

Permutation-based symbol probabilities

probabilities(x::AbstractDataset, est::SymbolicPermutation) → Vector{<:Real} 
probabilities(x::AbstractVector, est::SymbolicPermutation;  m::Int = 2, τ::Int = 1) → Vector{<:Real} 

probabilities!(s::Vector{Int}, x::AbstractDataset, est::SymbolicPermutation) → Vector{<:Real} 
probabilities!(s::Vector{Int}, x::AbstractVector, est::SymbolicPermutation;  m::Int = 2, τ::Int = 1) → Vector{<:Real}

Compute the unordered probabilities of the occurrence of symbol sequences constructed from the data x.

If x is a multivariate Dataset, then symbolization is performed directly on the state vectors. If x is a univariate signal, then a delay reconstruction with embedding lag τ and embedding dimension m is used to construct state vectors, on which symbolization is then performed.

A pre-allocated symbol array s can be provided to save some memory allocations if the probabilities are to be computed for multiple data sets. If provided, it is required that length(x) == length(s) if x is a Dataset, or length(s) == length(x) - (m-1)τ if x is a univariate signal.

See also: SymbolicPermutation.

source

Entropies.probabilities — Method

Weighted permutation-based symbol probabilities

probabilities(x::AbstractDataset, est::SymbolicWeightedPermutation) → Vector{<:Real}  
probabilities(x::AbstractVector{<:Real}, est::SymbolicWeightedPermutation; m::Int = 3, τ::Int = 1) → Vector{<:Real}

probabilities!(s::Vector{Int}, x::AbstractDataset, est::SymbolicWeightedPermutation) → Vector{<:Real}  
probabilities!(s::Vector{Int}, x::AbstractVector, est::SymbolicWeightedPermutation; m::Int = 3, τ::Int = 1) → Vector{<:Real}

Compute the unordered probabilities of the occurrence of weighted symbol sequences constructed from x.

If x is a multivariate Dataset, then symbolization is performed directly on the state vectors. If x is a univariate signal, then a delay reconstruction with embedding lag τ and embedding dimension m is used to construct state vectors, on which symbolization is then performed.

A pre-allocated symbol array s can be provided to save some memory allocations if the probabilities are to be computed for multiple data sets. If provided, it is required that length(x) == length(s) if x is a Dataset, or length(s) == length(x) - (m-1)τ if x is a univariate signal`.

source

Entropies.probabilities — Method

Amplitude-aware permutation-based symbol probabilities

probabilities(x::AbstractDataset, est::SymbolicAmplitudeAwarePermutation) → Vector{<:Real}  
probabilities(x::AbstractVector{<:Real}, est::SymbolicAmplitudeAwarePermutation; m::Int = 3, τ::Int = 1) → Vector{<:Real}

probabilities!(s::Vector{Int}, x::AbstractDataset, est::SymbolicAmplitudeAwarePermutation) → Vector{<:Real}  
probabilities!(s::Vector{Int}, x::AbstractVector, est::SymbolicAmplitudeAwarePermutation; m::Int = 3, τ::Int = 1) → Vector{<:Real}

Compute the unordered probabilities of the occurrence of amplitude-encoding symbol sequences constructed from x.

If x is a multivariate Dataset, then symbolization is performed directly on the state vectors. If x is a univariate signal, then a delay reconstruction with embedding lag τ and embedding dimension m is used to construct state vectors, on which symbolization is then performed.

A pre-allocated symbol array s can be provided to save some memory allocations if the probabilities are to be computed for multiple data sets. If provided, it is required that length(x) == length(s) if x is a Dataset, or length(s) == length(x) - (m-1)τ if x is a univariate signal`.

source

Entropies.probabilities — Method

Probabilities based on binning (visitation frequency)

probabilities(x::AbstractDataset, est::VisitationFrequency) → Vector{Real}

Superimpose a rectangular grid (bins/boxes) dictated by est over the data x and return the sum-normalized histogram (i.e. frequency at which the points of x visits the bins/boxes in the grid) in an unordered 1D form, discarding all non-visited bins and bin edge information.

Performances Notes

This method has a linearithmic time complexity (n log(n) for n = length(data)) and a linear space complexity l for l = dimension(data)). This allows computation of histograms of high-dimensional datasets and with small box sizes ε without memory overflow and with maximum performance.

Example

using Entropies, DelayEmbeddings
D = Dataset(rand(100, 3))

# How shall the data be partitioned? 
# Here, we subdivide each coordinate axis into 4 equal pieces
# over the range of the data, resulting in rectangular boxes/bins
ϵ = RectangularBinning(4)

# Feed partitioning instructions to estimator.
est = VisitationFrequency(ϵ)

# Estimate a probability distribution over the partition
probabilities(D, est)

source

Entropies.probabilities — Method

Wavelet-based time-scale probability estimation

probabilities(x::AbstractVector{<:Real}, est::TimeScaleMODWT, α = 1; 
    base = 2) → ps::AbstractVector{<:Real}

Compute the probability distribution of energies from a maximal overlap discrete wavelet transform (MODWT) of x. The probability ps[i] is the relative/total energy for the i-th wavelet scale.

using Entropies, Wavelets
N = 200
a = 10
t = LinRange(0, 2*a*π, N)
x = sin.(t .+  cos.(t/0.1)) .- 0.1;

# Pick a wavelet (if no wavelet provided, defaults to Wavelets.WL.Daubechies{12}())
wl = Wavelets.WT.Daubechies{12}()

# Compute the probabilities (relative energies) at the different wavelet scales
Entropies.probabilities(x, TimeScaleMODWT(wl))