Kernel density

Entropies.NaiveKernelType
NaiveKernel(ϵ::Real, method::KernelEstimationMethod = TreeDistance()) <: ProbabilitiesEstimator

Estimate probabilities/entropy using a "naive" kernel density estimation approach (KDE), as discussed in Prichard and Theiler (1995) [PrichardTheiler1995].

Probabilities $P(\mathbf{x}, \epsilon)$ are assigned to every point $\mathbf{x}$ by counting how many other points occupy the space spanned by a hypersphere of radius ϵ around $\mathbf{x}$, according to:

\[P_i( \mathbf{x}, \epsilon) \approx \dfrac{1}{N} \sum_{s \neq i } K\left( \dfrac{||\mathbf{x}_i - \mathbf{x}_s ||}{\epsilon} \right),\]

where $K(z) = 1$ if $z < 1$ and zero otherwise. Probabilities are then normalized.

Methods

  • Tree-based evaluation of distances using TreeDistance. Faster, but more memory allocation.
  • Direct evaluation of distances using DirectDistance. Slower, but less memory allocation. Also works for complex numbers.

Estimation

Probabilities or entropies can be estimated from Datasets.

  • probabilities(x::AbstractDataset, est::NaiveKernel). Associates a probability p to each point in x.
  • genentropy(x::AbstractDataset, est::NaiveKernel). Associate probability p to each point in x, then compute the generalized entropy from those probabilities.

Examples

using Entropy, DelayEmbeddings
pts = Dataset([rand(5) for i = 1:10000]);
ϵ = 0.2
est_direct = NaiveKernel(ϵ, DirectDistance())
est_tree = NaiveKernel(ϵ, TreeDistance())

p_direct = probabilities(pts, est_direct)
p_tree = probabilities(pts, est_tree)

# Check that both methods give the same probabilities
all(p_direct .== p_tree)

See also: DirectDistance, TreeDistance.

source

Distance evaluation methods

Entropies.TreeDistanceType
TreeDistance(metric::M = Euclidean()) <: KernelEstimationMethod

Pairwise distances are evaluated using a tree-based approach with the provided metric.

source
Entropies.DirectDistanceType
DirectDistance(metric::M = Euclidean()) <: KernelEstimationMethod

Pairwise distances are evaluated directly using the provided metric.

source

Example

Here, we draw some random points from a 2D normal distribution. Then, we use kernel density estimation to associate a probability to each point p, measured by how many points are within radius 1.5 of p. Plotting the actual points, along with their associated probabilities estimated by the KDE procedure, we get the following surface plot.

using Distributions, PyPlot, DelayEmbeddings, Entropies
𝒩 = MvNormal([1, -4], 2)
N = 500
D = Dataset(sort([rand(𝒩) for i = 1:N]))
x, y = columns(D)
p = probabilities(D, NaiveKernel(1.5))
surf(x, y, p.p)
xlabel("x"); ylabel("y")
savefig("kernel_surface.png")

  • PrichardTheiler1995Prichard, D., & Theiler, J. (1995). Generalized redundancies for time series analysis. Physica D: Nonlinear Phenomena, 84(3-4), 476-493.