Visitation frequency (binning)
Entropies.VisitationFrequency
— TypeVisitationFrequency(r::RectangularBinning) <: BinningProbabilitiesEstimator
A probability estimator based on binning data into rectangular boxes dictated by the binning scheme r
.
Example
# Construct boxes by dividing each coordinate axis into 5 equal-length chunks.
b = RectangularBinning(5)
# A probabilities estimator that, when applied a dataset, computes visitation frequencies
# over the boxes of the binning, constructed as describedon the previous line.
est = VisitationFrequency(b)
See also: RectangularBinning
.
Specifying binning/boxes
Entropies.RectangularBinning
— TypeRectangularBinning(ϵ) <: RectangularBinningScheme
Instructions for creating a rectangular box partition using the binning scheme ϵ
. Binning instructions are deduced from the type of ϵ
.
Rectangular binnings may be automatically adjusted to the data in which the RectangularBinning
is applied, as follows:
ϵ::Int
divides each coordinate axis intoϵ
equal-length intervals, extending the upper bound 1/100th of a bin size to ensure all points are covered.ϵ::Float64
divides each coordinate axis into intervals of fixed sizeϵ
, starting from the axis minima until the data is completely covered by boxes.ϵ::Vector{Int}
divides the i-th coordinate axis intoϵ[i]
equal-length intervals, extending the upper bound 1/100th of a bin size to ensure all points are covered.ϵ::Vector{Float64}
divides the i-th coordinate axis into intervals of fixed sizeϵ[i]
, starting from the axis minima until the data is completely covered by boxes.
Rectangular binnings may also be specified on arbitrary min-max ranges.
ϵ::Tuple{Vector{Tuple{Float64,Float64}},Int64}
creates intervals along each coordinate axis from ranges indicated by a vector of(min, max)
tuples, then divides each coordinate axis into an integer number of equal-length intervals. Note: this does not ensure that all points are covered by the data (points outside the binning are ignored).
Example 1: Grid deduced automatically from data (partition guaranteed to cover data points)
Flexible box sizes
The following binning specification finds the minima/maxima along each coordinate axis, then split each of those data ranges (with some tiny padding on the edges) into 10
equal-length intervals. This gives (hyper-)rectangular boxes, and works for data of any dimension.
using Entropies
RectangularBinning(10)
Now, assume the data consists of 2-dimensional points, and that we want a finer grid along one of the dimensions than over the other dimension.
The following binning specification finds the minima/maxima along each coordinate axis, then splits the range along the first coordinate axis (with some tiny padding on the edges) into 10
equal-length intervals, and the range along the second coordinate axis (with some tiny padding on the edges) into 5
equal-length intervals. This gives (hyper-)rectangular boxes.
using Entropies
RectangularBinning([10, 5])
Fixed box sizes
The following binning specification finds the minima/maxima along each coordinate axis, then split the axis ranges into equal-length intervals of fixed size 0.5
until the all data points are covered by boxes. This approach yields (hyper-)cubic boxes, and works for data of any dimension.
using Entropies
RectangularBinning(0.5)
Again, assume the data consists of 2-dimensional points, and that we want a finer grid along one of the dimensions than over the other dimension.
The following binning specification finds the minima/maxima along each coordinate axis, then splits the range along the first coordinate axis into equal-length intervals of size 0.3
, and the range along the second axis into equal-length intervals of size 0.1
(in both cases, making sure the data are completely covered by the boxes). This approach gives a (hyper-)rectangular boxes.
using Entropies
RectangularBinning([0.3, 0.1])
Example 2: Custom grids (partition not guaranteed to cover data points):
Assume the data consists of 3-dimensional points (x, y, z)
, and that we want a grid that is fixed over the intervals [x₁, x₂]
for the first dimension, over [y₁, y₂]
for the second dimension, and over [z₁, z₂]
for the third dimension. We when want to split each of those ranges into 4 equal-length pieces. Beware: some points may fall outside the partition if the intervals are not chosen properly (these points are simply discarded).
The following binning specification produces the desired (hyper-)rectangular boxes.
using Entropies, DelayEmbeddings
D = Dataset(rand(100, 3));
x₁, x₂ = 0.5, 1 # not completely covering the data, which are on [0, 1]
y₁, y₂ = -2, 1.5 # covering the data, which are on [0, 1]
z₁, z₂ = 0, 0.5 # not completely covering the data, which are on [0, 1]
ϵ = [(x₁, x₂), (y₁, y₂), (z₁, z₂)], 4 # [interval 1, interval 2, ...], n_subdivisions
RectangularBinning(ϵ)