Fractal Dimension

There are numerous methods that one can use to calculate a so-called "dimension" of a dataset which in the context of dynamical systems is called the Fractal dimension. Several variants of a computationally feasible fractal dimension exist and a simple usage example is shown in the Fractal dimension example subsection.

Generalized dimension

Based on the definition of the Generalized entropy, one can calculate an appropriate dimension, called generalized dimension:

ChaosTools.generalized_dimFunction
generalized_dim(dataset [, sizes]; q = 1, base = MathConstants.e) -> D_α

Return the α order generalized dimension of the dataset, by calculating the genentropy for each ε ∈ sizes.

The case of α=0 is often called "capacity" or "box-counting" dimension.

Description

The returned dimension is approximated by the (inverse) power law exponent of the scaling of the genentropy versus the box size ε, where ε ∈ sizes.

Calling this function performs a lot of automated steps:

  1. A vector of box sizes is decided by calling sizes = estimate_boxsizes(dataset), if sizes is not given.
  2. For each element of sizes the appropriate entropy is calculated, through h = genentropy.(Ref(dataset), sizes; α, base). Let x = -log.(sizes).
  3. The curve h(x) is decomposed into linear regions, using linear_regions(x, h).
  4. The biggest linear region is chosen, and a fit for the slope of that region is performed using the function linear_region, which does a simple linear regression fit using linreg. This slope is the return value of generalized_dim.

By doing these steps one by one yourself, you can adjust the keyword arguments given to each of these function calls, refining the accuracy of the result.

The following aliases are provided:

  • α = 0 : boxcounting_dim, capacity_dim
  • α = 1 : information_dim
Be wary when using `generalized_dim`

As stated clearly by the documentation string, calling generalized_dim performs a lot of automated steps by calling other functions (see below) with default arguments. It is actually more like a convenient bundle than an actual function and therefore you should be careful when considering the validity of the returned number.

Fractal dimension example

For an example of using entropies to compute the dimension of an attractor let's use everyone's favorite system:

using DynamicalSystems, PyPlot
lor = Systems.lorenz()
3-dimensional continuous dynamical system
 state:       [0.0, 10.0, 0.0]
 rule f:      loop
 in-place?    false
 jacobian:    loop_jac
 parameters:  [10.0, 28.0, 2.66667]

Our goal is to compute entropies for many different partition sizes ε, so let's get down to it:

tr = trajectory(lor, 100.0; Ttr = 10.0)

ες = ℯ .^ (-3.5:0.5:3.5) # semi-random guess
Hs = genentropy.(Ref(tr), ες; q = 1)
15-element Vector{Float64}:
 9.210440366976329
 9.208499748932574
 9.195140334192946
 9.136445496766951
 8.998905310194775
 8.705004397152349
 8.1305312663553
 7.331853533923084
 6.4086579132355626
 5.453352354803625
 4.485476740802796
 3.5260626238745894
 2.606754871093233
 1.8789633825490644
 0.5375831514462661
xs = @. -log(ες)
fig = figure()
plot(xs, Hs)
ylabel("\$H_1\$")
xlabel("\$-\\log (\\epsilon)\$");
fig.tight_layout(pad=0.3); fig

The slope of the linear scaling region of the above plot is the generalized dimension (of order q = 1) for the attractor of the Lorenz system.

Given that we see the plot, we can estimate where the linear scaling region starts and ends. However, we can use the function linear_region to get an estimate of the result as well. First let's visualize what it does:

lrs, slopes = linear_regions(xs, Hs, tol = 0.25)

fig = figure()
for i in 1:length(lrs)-1
    plot(xs[lrs[i]:lrs[i+1]], Hs[lrs[i]:lrs[i+1]], marker = "o")
end
ylabel("\$H_1\$")
xlabel("\$-\\log (\\epsilon)\$");
fig.tight_layout(pad=0.3); fig

The linear_region function computes the slope of the largest region:

Δ = linear_region(xs, Hs)[2]
1.833384047211349

This result is an approximation of the information dimension (because we used q = 1) of the Lorenz attractor.


The above pipeline is bundled in generalized_dim. For example, the dimension of the strange attractor of the Systems.henon map, following the above approach but taking automated steps, is:

using DynamicalSystems
hen = Systems.henon()
tr = trajectory(hen, 200000)
D_hen = generalized_dim(tr; q = 1)
1.2276426984800552

As a side note, be sure that you have enough data points, otherwise the values you will get will never be correct, as is demonstrated by J.-P. Eckmann and D. Ruelle (see Physica D 56, pp 185-187 (1992)).

Linear scaling regions

And other utilities, especially linreg, used in both [generalized_dim] and grassberger.

ChaosTools.linear_regionsFunction
linear_regions(x, y; dxi::Int = 1, tol = 0.25) -> (lrs, tangents)

Identify regions where the curve y(x) is linear, by scanning the x-axis every dxi indices sequentially (e.g. at x[1] to x[5], x[5] to x[10], x[10] to x[15] and so on if dxi=5).

If the slope (calculated via linear regression) of a region of width dxi is approximatelly equal to that of the previous region, within tolerance tol, then these two regions belong to the same linear region.

Return the indices of x that correspond to linear regions, lrs, and the correct tangents at each region (obtained via a second linear regression at each accumulated region).

ChaosTools.linear_regionFunction
linear_region(x, y; kwargs...) -> ((ind1, ind2), slope)

Call linear_regions and identify and return the largest linear region and its slope. The region starts and stops at x[ind1:ind2].

The keywords dxi, tol are propagated as-is to linear_regions. The keyword ignore_saturation = true ignores saturation that (typically) happens at the final points of the curve y(x), where the curve flattens out.

ChaosTools.linregFunction
linreg(x, y) -> a, b

Perform a linear regression to find the best coefficients so that the curve: z = a + b*x has the least squared error with y.

ChaosTools.estimate_boxsizesFunction
estimate_boxsizes(A::Dataset; kwargs...) → εs

Return k exponentially spaced values: εs = base .^ range(lower + w, upper + z; length = k), that are a good estimate for sizes ε that are used in calculating a Fractal Dimension. It is strongly recommended to regularize input dataset A before using this function.

Let d₋ be the minimum pair-wise distance in A and d₊ the maximum distance along each of the dimensions of A. Then lower = log(base, d₋) and upper = log(base, d₊). Because by default w=1, z=-1, the returned sizes are an order of mangitude larger than the minimum distance, and an order of magnitude smaller than the maximum distance.

Keywords

  • w = 1, z = -1, k = 20 : as explained above.
  • base = MathConstants.e : the base used in the log function.

Correlation dimension

ChaosTools.correlationsumFunction
correlationsum(X, ε::Real; w = 0, norm = Euclidean(), q = 2) → C_q(ε)

Calculate the q-order correlation sum of X (Dataset or timeseries) for a given radius ε and norm.

The function boxed_correlationsum is faster and should be preferred over this one.

Description

The correlation sum is done using the formula:

\[C_2(\epsilon) = \frac{2}{(N-w)(N-w-1)}\sum_{i=1}^{N}\sum_{j=1+w+i}^{N} B(||X_i - X_j|| < \epsilon)\]

for q=2 and

\[C_q(\epsilon) = \left[\frac{1}{\alpha} \sum_{i=w+1}^{N-w}\left[\sum_{j:|i-j| > w} B(||X_i - X_j|| < \epsilon)\right]^{q-1}\right]^{1/(q-1)}\]

where

\[\alpha = (N-2w)(N-2w-1)^{(q-1)}\]

for q≠2, where $N$ is its length and $B$ gives 1 if the argument is true. w is the Theiler window. If ε is a vector its values have to be ordered. See the article of Grassberger for the general definition [Grassberger2007] and the book "Nonlinear Time Series Analysis" [Kantz2003], Ch. 6, for a discussion around w and choosing best values and Ch. 11.3 for the explicit definition of the q-order correlationsum.

correlationsum(X, εs::AbstractVector; w, norm, q) → C_q(ε)

If εs is a vector, C_q is calculated for each ε ∈ εs. If also q=2, some strong optimizations are done, but this requires the allocation a matrix of size N×N. If this is larger than your available memory please use instead:

[correlationsum(..., ε) for ε in εs]

See grassberger for more. See also takens_best_estimate.

ChaosTools.grassbergerFunction
grassberger(data, εs = estimate_boxsizes(data); kwargs...) → D_C

Use the method of Grassberger and Proccacia[Grassberger1983], and the correction by Theiler[Theiler1986], to estimate the correlation dimension D_C of the given data.

This function does something extrely simple:

cm = correlationsum(data, εs; kwargs...)
return linear_region(log.(sizes), log(cm))[2]

i.e. it calculates correlationsum for various radii and then tries to find a linear region in the plot of the log of the correlation sum versus log(ε). See generalized_dim for a more thorough explanation.

See also takens_best_estimate.

ChaosTools.boxed_correlationsumFunction
boxed_correlationsum(data, εs, r0 = maximum(εs); q = 2 , M = size(data, 2), w = 0)
boxed_correlationsum(data; q = 2 , M = size(data, 2), w = 0)

Estimate the box assisted q-order correlation sum[Kantz2003] out of a dataset data for radii εs by splitting the data into boxes of size r0 beforehand.

Description

C_q(ε) is calculated for every ε ∈ εs and each of the boxes to then be summed up afterwards. If M is unequal to the dimension of the data, only the first M dimensions are considered for the box distribution (this is called the prism-assisted version). The method of splitting the data into boxes was implemented according to Theiler[Theiler1987]. If only a dataset is given the radii εs and boxsize r0 are chosen by calculating estimate_r0_buenoorovio.

w is the Theiler window.

The function is explicitly optimized for q = 2 and becomes quite slow for q ≠ 2.

See correlationsum for the definition of C_q and also data_boxing.

ChaosTools.data_boxingFunction
data_boxing(data, r0, M = size(data, 2))

Distribute the data points into boxes of size r0. Return box positions and the contents of each box as two separate vectors. Implemented according to the paper by Theiler[Theiler1987] improving the algorithm by Grassberger and Procaccia[Grassberger1983]. If M is smaller than the dimension of the data, only the first M dimensions are considered for the distribution into boxes.

See also: boxed_correlationsum.

ChaosTools.correlationsum_fixedmassFunction
correlationsum_fixedmass(data, max_j; metric = Euclidean(), M = length(data)) → rs, ys

A fixed mass algorithm for the calculation of the fractal dimension $\Delta$ according to [^Grassberger 1988] with max_j the maximum number of neighbours that should be considered for the calculation, M defines the number of points considered for the calculation, default is the whole data set.

Implements

\[\Delta \overline{\log r^{(j)}} \sim Ψ(j) - \log N\]

where $\Psi(j) = \frac{\text{d} \log Γ(j)}{\text{d} j}$, rs = $\overline{\log r^{(j)}}$ and ys = $\Psi(j) - \log N$.

$\Delta$ can be computed by using linear_region(rs, ys).

[^Grassberger 1988]: Peter Grassberger (1988) Finite sample Corrections to Entropy and Dimension Estimates, Physics Letters A 128(6-7)

ChaosTools.takens_best_estimateFunction
takens_best_estimate(X, εmax, metric = Chebyshev(),εmin = 0) → D_C, D_C_95u, D_C_95l

Use the so-called "Takens' best estimate" [Takens1985][Theiler1988] method for estimating the correlation dimension D_C and the upper (D_C_95u) and lower (D_C_95l) confidence limit for the given dataset X.

The original formula is

\[D_C \approx \frac{C(\epsilon_\text{max})}{\int_0^{\epsilon_\text{max}}(C(\epsilon) / \epsilon) \, d\epsilon}\]

where $C$ is the correlationsum and $\epsilon_\text{max}$ is an upper cutoff. Here we use the later expression

\[D_C \approx - \frac{1}{\eta},\quad \eta = \frac{1}{(N-1)^*}\sum_{[i, j]^*}\log(||X_i - X_j|| / \epsilon_\text{max})\]

where the sum happens for all $i, j$ so that $i < j$ and $||X_i - X_j|| < \epsilon_\text{max}$. In the above expression, the bias in the original paper has already been corrected, as suggested in [Borovkova1999].

The confidence limits are estimated from the log-likelihood function by finding the values of D_C where the function has fallen by 2 from its maximum, see e.g. [Barlow] chapter 5.3 Because the CLT does not apply (no independent measurements), the limits are not neccesarily symmetric.

According to [Borovkova1999], introducing a lower cutoff εmin can make the algorithm more stable (no divergence), this option is given but defaults to zero.

If X comes from a delay coordinates embedding of a timseries x, a recommended value for $\epsilon_\text{max}$ is std(x)/4.

ChaosTools.kernelprobFunction
kernelprob(X, ε; norm = Euclidean()) → p::Probabilities

Associate each point in X (Dataset or timesries) with a probability p using the "kernel estimation" (also called "nearest neighbor kernel estimation" and other names):

\[p_j = \frac{1}{N}\sum_{i=1}^N B(||X_i - X_j|| < \epsilon)\]

where $N$ is its length and $B$ gives 1 if the argument is true.

See also genentropy and correlationsum. kernelprob is equivalent with probabilities(X, NaiveKernel(ε, TreeDistance(norm))).

Kaplan-Yorke Dimension

ChaosTools.kaplanyorke_dimFunction
kaplanyorke_dim(λs::AbstractVector)

Calculate the Kaplan-Yorke dimension, a.k.a. Lyapunov dimension[Kaplan1970].

Description

The Kaplan-Yorke dimension is simply the point where cumsum(λs) becomes zero (interpolated):

\[ D_{KY} = k + \frac{\sum_{i=1}^k \lambda_i}{|\lambda_{k+1}|},\quad k = \max_j \left[ \sum_{i=1}^j \lambda_i > 0 \right].\]

If the sum of the exponents never becomes negative the function will return the length of the input vector.

Useful in combination with lyapunovspectrum.

Notice that calling this function requires you to pass the Lyapunov exponents in an ordered vector form (largest to smallest). Example:

using DynamicalSystems
hen = Systems.henon()
D_kp = kaplanyorke_dim(lyapunovspectrum(hen, 100000))
1.2587105621373085