Visitation frequency (binning)
Entropies.VisitationFrequency
— TypeVisitationFrequency(r::RectangularBinning) <: BinningProbabilitiesEstimator
A probability estimator based on binning data into rectangular boxes dictated by the binning scheme r
.
Example
# Construct boxes by dividing each coordinate axis into 5 equal-length chunks.
b = RectangularBinning(5)
# A probabilities estimator that, when applied a dataset, computes visitation frequencies
# over the boxes of the binning, constructed as describedon the previous line.
est = VisitationFrequency(b)
See also: RectangularBinning
.
Specifying binning/boxes
Entropies.RectangularBinning
— TypeRectangularBinning(ϵ) <: RectangularBinningScheme
Instructions for creating a rectangular box partition using the binning scheme ϵ
. Binning instructions are deduced from the type of ϵ
.
Rectangular binnings may be automatically adjusted to the data in which the RectangularBinning
is applied, as follows:
ϵ::Int
divides each coordinate axis intoϵ
equal-length intervals, extending the upper bound 1/100th of a bin size to ensure all points are covered.ϵ::Float64
divides each coordinate axis into intervals of fixed sizeϵ
, starting from the axis minima until the data is completely covered by boxes.ϵ::Vector{Int}
divides the i-th coordinate axis intoϵ[i]
equal-length intervals, extending the upper bound 1/100th of a bin size to ensure all points are covered.ϵ::Vector{Float64}
divides the i-th coordinate axis into intervals of fixed sizeϵ[i]
, starting from the axis minima until the data is completely covered by boxes.
Rectangular binnings may also be specified on arbitrary min-max ranges.
ϵ::Tuple{Vector{Tuple{Float64,Float64}},Int64}
creates intervals along each coordinate axis from ranges indicated by a vector of(min, max)
tuples, then divides each coordinate axis into an integer number of equal-length intervals. Note: this does not ensure that all points are covered by the data (points outside the binning are ignored).
Example 1: Grid deduced automatically from data (partition guaranteed to cover data points)
Flexible box sizes
The following binning specification finds the minima/maxima along each coordinate axis, then split each of those data ranges (with some tiny padding on the edges) into 10
equal-length intervals. This gives (hyper-)rectangular boxes, and works for data of any dimension.
using Entropies
RectangularBinning(10)
Now, assume the data consists of 2-dimensional points, and that we want a finer grid along one of the dimensions than over the other dimension.
The following binning specification finds the minima/maxima along each coordinate axis, then splits the range along the first coordinate axis (with some tiny padding on the edges) into 10
equal-length intervals, and the range along the second coordinate axis (with some tiny padding on the edges) into 5
equal-length intervals. This gives (hyper-)rectangular boxes.
using Entropies
RectangularBinning([10, 5])
Fixed box sizes
The following binning specification finds the minima/maxima along each coordinate axis, then split the axis ranges into equal-length intervals of fixed size 0.5
until the all data points are covered by boxes. This approach yields (hyper-)cubic boxes, and works for data of any dimension.
using Entropies
RectangularBinning(0.5)
Again, assume the data consists of 2-dimensional points, and that we want a finer grid along one of the dimensions than over the other dimension.
The following binning specification finds the minima/maxima along each coordinate axis, then splits the range along the first coordinate axis into equal-length intervals of size 0.3
, and the range along the second axis into equal-length intervals of size 0.1
(in both cases, making sure the data are completely covered by the boxes). This approach gives a (hyper-)rectangular boxes.
using Entropies
RectangularBinning([0.3, 0.1])
Example 2: Custom grids (partition not guaranteed to cover data points):
Assume the data consists of 3-dimensional points (x, y, z)
, and that we want a grid that is fixed over the intervals [x₁, x₂]
for the first dimension, over [y₁, y₂]
for the second dimension, and over [z₁, z₂]
for the third dimension. We when want to split each of those ranges into 4 equal-length pieces. Beware: some points may fall outside the partition if the intervals are not chosen properly (these points are simply discarded).
The following binning specification produces the desired (hyper-)rectangular boxes.
using Entropies, DelayEmbeddings
D = Dataset(rand(100, 3));
x₁, x₂ = 0.5, 1 # not completely covering the data, which are on [0, 1]
y₁, y₂ = -2, 1.5 # covering the data, which are on [0, 1]
z₁, z₂ = 0, 0.5 # not completely covering the data, which are on [0, 1]
ϵ = [(x₁, x₂), (y₁, y₂), (z₁, z₂)], 4 # [interval 1, interval 2, ...], n_subdivisions
RectangularBinning(ϵ)
Utility methods
Some convenience functions bin encoding are provided.
Entropies.encode_as_bin
— Functionencode_as_bin(point, reference_point, edgelengths) → Vector{Int}
Encode a point into its integer bin labels relative to some reference_point
(always counting from lowest to highest magnitudes), given a set of box edgelengths
(one for each axis). The first bin on the positive side of the reference point is indexed with 0, and the first bin on the negative side of the reference point is indexed with -1.
See also: joint_visits
, marginal_visits
.
Example
using Entropies
refpoint = [0, 0, 0]
steps = [0.2, 0.2, 0.3]
encode_as_bin(rand(3), refpoint, steps)
Entropies.joint_visits
— Functionjoint_visits(points, binning_scheme::RectangularBinning) → Vector{Vector{Int}}
Determine which bins are visited by points
given the rectangular binning scheme ϵ
. Bins are referenced relative to the axis minima, and are encoded as integers, such that each box in the binning is assigned a unique integer array (one element for each dimension).
For example, if a bin is visited three times, then the corresponding integer array will appear three times in the array returned.
See also: marginal_visits
, encode_as_bin
.
Example
using DelayEmbeddings, Entropies
pts = Dataset([rand(5) for i = 1:100]);
joint_visits(pts, RectangularBinning(0.2))
Entropies.marginal_visits
— Functionmarginal_visits(points, binning_scheme::RectangularBinning, dims) → Vector{Vector{Int}}
Determine which bins are visited by points
given the rectangular binning scheme ϵ
, but only along the desired dimensions dims
. Bins are referenced relative to the axis minima, and are encoded as integers, such that each box in the binning is assigned a unique integer array (one element for each dimension in dims
).
For example, if a bin is visited three times, then the corresponding integer array will appear three times in the array returned.
See also: joint_visits
, encode_as_bin
.
Example
using DelayEmbeddings, Entropies
pts = Dataset([rand(5) for i = 1:100]);
# Marginal visits along dimension 3 and 5
marginal_visits(pts, RectangularBinning(0.3), [3, 5])
# Marginal visits along dimension 2 through 5
marginal_visits(pts, RectangularBinning(0.3), 2:5)
marginal_visits(joint_visits, dims) → Vector{Vector{Int}}
If joint visits have been precomputed using joint_visits
, marginal visits can be returned directly without providing the binning again using the marginal_visits(joint_visits, dims)
signature.
See also: joint_visits
, encode_as_bin
.
Example
using DelayEmbeddings, Entropies
pts = Dataset([rand(5) for i = 1:100]);
# First compute joint visits, then marginal visits along dimensions 1 and 4
jv = joint_visits(pts, RectangularBinning(0.2))
marginal_visits(jv, [1, 4])
# Marginals along dimension 2
marginal_visits(jv, 2)