# Fractal Dimension

There are numerous methods that one can use to calculate a so-called "dimension" of a dataset which in the context of dynamical systems is called the Fractal dimension. Several variants of a computationally feasible fractal dimension exist and a simple usage example is shown in the Fractal dimension example subsection.

## Generalized dimension

Based on the definition of the Generalized entropy (`genentropy`

), one can calculate an appropriate dimension, called *generalized dimension*:

`ChaosTools.generalized_dim`

— Function`generalized_dim(dataset [, sizes]; q = 1, base = MathConstants.e) -> Δ_q`

Return the `q`

order generalized dimension of the `dataset`

, by calculating the `genentropy`

for each `ε ∈ sizes`

.

The case of `q = 0`

is often called "capacity" or "box-counting" dimension, while `q = 1`

is the "information" dimension.

**Description**

The returned dimension is approximated by the (inverse) power law exponent of the scaling of the `genentropy`

$H_q$ versus the box size `ε`

, where `ε ∈ sizes`

:

\[H_q \sim -\Delta_q\log(\varepsilon)\]

Calling this function performs a lot of automated steps:

- A vector of box sizes is decided by calling
`sizes = estimate_boxsizes(dataset)`

, if`sizes`

is not given. - For each element of
`sizes`

the appropriate entropy is calculated, through`H = genentropy.(Ref(dataset), sizes; q, base)`

. Let`x = -log.(sizes)`

. - The curve
`H(x)`

is decomposed into linear regions, using`linear_regions`

`(x, h)`

. - The biggest linear region is chosen, and a fit for the slope of that region is performed using the function
`linear_region`

, which does a simple linear regression fit using`linreg`

. This slope is the return value of`generalized_dim`

.

By doing these steps one by one yourself, you can adjust the keyword arguments given to each of these function calls, refining the accuracy of the result.

`ChaosTools.molteno_dim`

— Function`molteno_dim(data::Dataset; k0 = 10, q = 1.0, base = ℯ)`

Calculate the generalized dimension using the algorithm for box division defined by Molteno^{[Molteno1993]}.

**Description**

Divide the data into boxes with each new box having half the side length of the former box using `molteno_boxing`

. Break if the number of points over the number of filled boxes falls below `k0`

. Then the generalized dimension can be calculated by using `genentropy`

to calculate the sum over the logarithm also considering possible approximations and fitting this to the logarithm of one over the boxsize using `linear_region`

.

This algorithm is faster than the traditional approach of using `probabilities`

, but it is only suited for low dimensional data since it divides each box into `2^D`

new boxes if `D`

is the dimension. For large `D`

this leads to low numbers of box divisions before the threshold is passed and the divison stops. This results to a low number of data points to fit the dimension to and thereby a poor estimate.

`ChaosTools.molteno_boxing`

— Function`molteno_boxing(data::Dataset; k0 = 10) → (probs, εs)`

Distribute the `data`

into boxes whose size is halved in each step. Stop if the average number of points per filled box falls below the threshold `k0`

.

Return `probs`

, a vector of `Propabilities`

for different box sizes and the corresponding box sizes `εs`

.

**Description**

Project the `data`

onto the whole interval of numbers that is covered by `UInt64`

. This projected data is distributed into boxes whose size decreases by factor 2 in each step. For each box that contains more than one point `2^D`

new boxes are created where `D`

is the dimension of the data.

The process of dividing the data into new boxes stops when the number of points over the number of filled boxes falls below `k0`

. The box sizes `εs`

are calculated and returned together with the `probs`

.

See^{[Molteno1993]} for more.

As stated clearly by the documentation string, calling `generalized_dim`

performs a lot of automated steps by calling other functions (see below) with default arguments. It is actually more like a convenient bundle than an actual function and therefore you should be careful when considering the validity of the returned number.

## Fractal dimension example

For an example of using entropies to compute the dimension of an attractor let's use everyone's favorite system:

```
using DynamicalSystems, CairoMakie
lor = Systems.lorenz()
```

```
3-dimensional continuous dynamical system
state: [0.0, 10.0, 0.0]
rule f: loop
in-place? false
jacobian: loop_jac
parameters: [10.0, 28.0, 2.66667]
```

Our goal is to compute entropies for many different partition sizes `ε`

, so let's get down to it:

```
tr = trajectory(lor, 100.0; Ttr = 10.0)
ες = ℯ .^ (-3.5:0.5:3.5) # semi-random guess
Hs = genentropy.(Ref(tr), ες; q = 1)
```

```
15-element Vector{Float64}:
9.209470057954452
9.204618512845059
9.18923218406953
9.131923502389052
8.988193092161229
8.650059940707688
8.040368989361795
7.230911670856004
6.294599083922672
5.329018829494319
4.394206819223289
3.4856074401846757
2.5082750581883295
1.775648810506457
0.4276980912260111
```

```
xs = @. -log(ες)
fig = Figure(resolution = (500,500))
ax = Axis(fig[1,1]; ylabel = L"H_1", xlabel = L"-\log (\epsilon)")
lines!(ax, xs, Hs)
fig
```

The slope of the linear scaling region of the above plot is the generalized dimension (of order q = 1) for the attractor of the Lorenz system.

Given that we *see* the plot, we can estimate where the linear scaling region starts and ends. However, we can use the function `linear_region`

to get an estimate of the result as well. First let's visualize what it does:

```
lrs, slopes = linear_regions(xs, Hs, tol = 0.25)
fig = Figure(resolution = (500,500))
ax = Axis(fig[1,1]; ylabel = L"H_1", xlabel = L"-\log (\epsilon)")
for i in 1:length(lrs)-1
scatterlines!(ax, xs[lrs[i]:lrs[i+1]], Hs[lrs[i]:lrs[i+1]])
end
fig
```

The `linear_region`

function computes the slope of the largest region:

`Δ = linear_region(xs, Hs)[2]`

`1.8292383632573987`

This result is an approximation of the information dimension (because we used `q = 1`

) of the Lorenz attractor.

The above pipeline is bundled in `generalized_dim`

. For example, the dimension of the strange attractor of the `Systems.henon`

map, following the above approach but taking automated steps, is:

```
using DynamicalSystems
hen = Systems.henon()
tr = trajectory(hen, 200000)
D_hen = generalized_dim(tr; q = 1)
```

`1.221186840581862`

As a side note, be sure that you have enough data points, otherwise the values you will get will never be correct, as is demonstrated by J.-P. Eckmann and D. Ruelle (see Physica D **56**, pp 185-187 (1992)).

## Linear scaling regions

And other utilities, especially `linreg`

, used in both [`generalized_dim`

] and `grassberger_dim`

.

`ChaosTools.linear_regions`

— Function`linear_regions(x, y; dxi::Int = 1, tol = 0.25) -> (lrs, tangents)`

Identify regions where the curve `y(x)`

is linear, by scanning the `x`

-axis every `dxi`

indices sequentially (e.g. at `x[1] to x[5], x[5] to x[10], x[10] to x[15]`

and so on if `dxi=5`

).

If the slope (calculated via linear regression) of a region of width `dxi`

is approximatelly equal to that of the previous region, within tolerance `tol`

, then these two regions belong to the same linear region.

Return the indices of `x`

that correspond to linear regions, `lrs`

, and the *correct* `tangents`

at each region (obtained via a second linear regression at each accumulated region).

`ChaosTools.linear_region`

— Function`linear_region(x, y; kwargs...) -> ((ind1, ind2), slope)`

Call `linear_regions`

and identify and return the largest linear region and its slope. The region starts and stops at `x[ind1:ind2]`

.

The keywords `dxi, tol`

are propagated as-is to `linear_regions`

. The keyword `ignore_saturation = true`

ignores saturation that (sometimes) happens at the start and end of the curve `y(x)`

, where the curve flattens. The keyword `sat = 0.01`

decides what saturation is (while `abs(y[i]-y[i+1])<sat`

we are in a saturation regime).

The keyword `warning = true`

prints a warning if the linear region is less than 1/3 of the available x-axis.

`ChaosTools.linreg`

— Function`linreg(x, y) -> a, b`

Perform a linear regression to find the best coefficients so that the curve: `z = a + b*x`

has the least squared error with `y`

.

`ChaosTools.estimate_boxsizes`

— Function`estimate_boxsizes(A::Dataset; kwargs...) → εs`

Return `k`

exponentially spaced values: `εs = base .^ range(lower + w, upper + z; length = k)`

, that are a good estimate for sizes ε that are used in calculating a Fractal Dimension. It is strongly recommended to `regularize`

input dataset `A`

before using this function.

Let `d₋`

be the minimum pair-wise distance in `A`

and `d₊`

the average total length of `A`

along each of the dimensions of `A`

. Then `lower = log(base, d₋)`

and `upper = log(base, d₊)`

. Because by default `w=1, z=-1`

, the returned sizes are an order of mangitude larger than the minimum distance, and an order of magnitude smaller than the maximum distance.

**Keywords**

`w = 1, z = -1, k = 20`

: as explained above.`base = MathConstants.e`

: the base used in the`log`

function.`warning = true`

: Print some warnings for bad estimates.`autoexpand = true`

: If the final estimated range does not cover at least 2 orders of magnitude, it is automatically expanded by setting`w -= we`

and`z -= ze`

. You can set different default values to the keywords`we = w, ze = z`

.

## Correlation sum based dimension

`ChaosTools.correlationsum`

— Function`correlationsum(X, ε; w = 0, norm = Euclidean(), q = 2) → C_q(ε)`

Calculate the `q`

-order correlation sum of `X`

(`Dataset`

or timeseries) for a given radius `ε`

and `norm`

. They keyword `show_progress = true`

can be used to display a progress bar for large `X`

.

`correlationsum(X, εs::AbstractVector; w, norm, q) → C_q(ε)`

If `εs`

is a vector, `C_q`

is calculated for each `ε ∈ εs`

more efficiently. If also `q=2`

, we attempt to do further optimizations, if the allocation a matrix of size `N×N`

is possible.

The function `boxed_correlationsum`

is faster and should be preferred over this one.

**Description**

The correlation sum is defined as follows for `q=2`

:

\[C_2(\epsilon) = \frac{2}{(N-w)(N-w-1)}\sum_{i=1}^{N}\sum_{j=1+w+i}^{N} B(||X_i - X_j|| < \epsilon)\]

for as follows for `q≠2`

\[C_q(\epsilon) = \left[\frac{1}{\alpha} \sum_{i=w+1}^{N-w} \left[\sum_{j:|i-j| > w} B(||X_i - X_j|| < \epsilon)\right]^{q-1}\right]^{1/(q-1)}\]

where

\[\alpha = (N-2w)(N-2w-1)^{(q-1)}\]

with $N$ the length of `X`

and $B$ gives 1 if its argument is `true`

. `w`

is the Theiler window. See the article of Grassberger for the general definition ^{[Grassberger2007]} and the book "Nonlinear Time Series Analysis" ^{[Kantz2003]}, Ch. 6, for a discussion around choosing best values for `w`

, and Ch. 11.3 for the explicit definition of the q-order correlationsum.

The scaling of $\log C_q$ versus $\log \epsilon$ approximates the q-order generalized (Rényi) dimension.

`ChaosTools.grassberger_dim`

— Function`grassberger_dim(data, εs = estimate_boxsizes(data); kwargs...) → D_C`

Use the method of Grassberger and Proccacia^{[Grassberger1983]}, and the correction by Theiler^{[Theiler1986]}, to estimate the correlation dimension `D_C`

of the given `data`

.

This function does something extremely simple:

```
cm = correlationsum(data, εs; kwargs...)
return linear_region(log.(sizes), log(cm))[2]
```

i.e. it calculates `correlationsum`

for various radii and then tries to find a linear region in the plot of the log of the correlation sum versus log(ε). See `generalized_dim`

for a more thorough explanation.

See also `takens_best_estimate`

.

`ChaosTools.boxed_correlationsum`

— Function`boxed_correlationsum(X::Dataset, εs, r0 = maximum(εs); kwargs...) → Cs`

Estimate the box assisted q-order correlation sum `Cs`

out of a dataset `X`

for each radius in `εs`

, by splitting the data into boxes of size `r0`

beforehand. This method is much faster than `correlationsum`

, **provided that** the box size `r0`

is significantly smaller than then the attractor length. A good estimate for `r0`

is `estimate_r0_buenoorovio`

.

See `correlationsum`

for the definition of the correlation sum.

`boxed_correlationsum(X::Dataset; kwargs...) → εs, Cs`

In this method the minimum inter-point distance and `estimate_r0_buenoorovio`

are used to estimate good `εs`

for the calculation, which are also returned.

**Keywords**

`q = 2`

: The order of the correlation sum.`P = autoprismdim(data)`

: The prism dimension.`w = 0`

: The Theiler window.`show_progress = false`

: Whether to display a progress bar for the calculation.

**Description**

`C_q(ε)`

is calculated for every `ε ∈ εs`

and each of the boxes to then be summed up afterwards. The method of splitting the data into boxes was implemented according to Theiler^{[Theiler1987]}. `w`

is the Theiler window. `P`

is the prism dimension. If `P`

is unequal to the dimension of the data, only the first `P`

dimensions are considered for the box distribution (this is called the prism-assisted version). By default `P`

is choosen automatically.

The function is explicitly optimized for `q = 2`

but becomes quite slow for `q ≠ 2`

.

See `correlationsum`

for the definition of `C_q`

and also `data_boxing`

to use the algorithm that splits data into boxes.

`ChaosTools.data_boxing`

— Function`data_boxing(data, r0, P) → boxes, contents`

Distribute the `data`

points into boxes of size `r0`

. Return box positions and the contents of each box as two separate vectors. Implemented according to the paper by Theiler^{[Theiler1987]} improving the algorithm by Grassberger and Procaccia^{[Grassberger1983]}. If `P`

is smaller than the dimension of the data, only the first `P`

dimensions are considered for the distribution into boxes.

See also: `boxed_correlationsum`

.

`ChaosTools.estimate_r0_buenoorovio`

— Function`estimate_r0_buenoorovio(X::Dataset, P = autoprismdim(X))`

Estimates a reasonable size for boxing the time series `X`

proposed by Bueno-Orovio and Pérez-García^{[Bueno2007]} before calculating the correlation dimension as presented by Theiler^{[Theiler1983]}. If instead of boxes, prisms are chosen everything stays the same but `P`

is the dimension of the prism. To do so the dimension `ν`

is estimated by running the algorithm by Grassberger and Procaccia^{[Grassberger1983]} with `√N`

points where `N`

is the number of total data points. An effective size `ℓ`

of the attractor is calculated by boxing a small subset of size `N/10`

into boxes of sidelength `r_ℓ`

and counting the number of filled boxes `η_ℓ`

.

\[\ell = r_\ell \eta_\ell ^{1/\nu}\]

The optimal number of filled boxes `η_opt`

is calculated by minimising the number of calculations.

\[\eta_\textrm{opt} = N^{2/3}\cdot \frac{3^\nu - 1}{3^P - 1}^{1/2}.\]

`P`

is the dimension of the data or the number of edges on the prism that don't span the whole dataset.

Then the optimal boxsize $r_0$ computes as

\[r_0 = \ell / \eta_\textrm{opt}^{1/\nu}.\]

`ChaosTools.estimate_r0_theiler`

— Function`estimate_r0_theiler(X::Dataset) → r0, ε0`

Estimate a reasonable size for boxing the data `X`

before calculating the `boxed_correlationsum`

proposed by Theiler^{[Theiler1987]}. Return the boxing size `r0`

and minimum inter-point distance in `X`

, `ε0`

.

To do so the dimension is estimated by running the algorithm by Grassberger and Procaccia^{[Grassberger1983]} with `√N`

points where `N`

is the number of total data points. Then the optimal boxsize $r_0$ computes as

\[r_0 = R (2/N)^{1/\nu}\]

where $R$ is the size of the chaotic attractor and $\nu$ is the estimated dimension.

`ChaosTools.correlationsum_fixedmass`

— Function`correlationsum_fixedmass(data, max_j; metric = Euclidean(), M = length(data)) → rs, ys`

A fixed mass algorithm for the calculation of a fractal dimension $\Delta$ with `max_j`

the maximum number of neighbours that should be considered for the calculation. `M`

defines the number of points considered for the averaging of distances.

**Description**

The calculated $\Delta$ approximates the information dimension. The implementation here is due to to ^{[Grassberger1988]}, which defines

\[\Delta \times \overline{\log r^{(j)}} \sim Ψ(j) - \log N\]

where $\Psi(j) = \frac{\text{d} \log Γ(j)}{\text{d} j}$ is the digamma function, `rs`

= $\overline{\log r^{(j)}}$ and `ys`

= $\Psi(j) - \log N$ ($N$ is the length of the data).

$\Delta$ can be computed by using `linear_region(rs, ys)`

.

## Takens' best estimate

`ChaosTools.takens_best_estimate`

— Function`takens_best_estimate(X, εmax, metric = Chebyshev(),εmin = 0) → D_C, D_C_95u, D_C_95l`

Use the so-called "Takens' best estimate" ^{[Takens1985]}^{[Theiler1988]} method for estimating the correlation dimension `D_C`

and the upper (`D_C_95u`

) and lower (`D_C_95l`

) confidence limit for the given dataset `X`

.

The original formula is

\[D_C \approx \frac{C(\epsilon_\text{max})}{\int_0^{\epsilon_\text{max}}(C(\epsilon) / \epsilon) \, d\epsilon}\]

where $C$ is the `correlationsum`

and $\epsilon_\text{max}$ is an upper cutoff. Here we use the later expression

\[D_C \approx - \frac{1}{\eta},\quad \eta = \frac{1}{(N-1)^*}\sum_{[i, j]^*}\log(||X_i - X_j|| / \epsilon_\text{max})\]

where the sum happens for all $i, j$ so that $i < j$ and $||X_i - X_j|| < \epsilon_\text{max}$. In the above expression, the bias in the original paper has already been corrected, as suggested in ^{[Borovkova1999]}.

The confidence limits are estimated from the log-likelihood function by finding the values of `D_C`

where the function has fallen by 2 from its maximum, see e.g. ^{[Barlow]} chapter 5.3 Because the CLT does not apply (no independent measurements), the limits are not neccesarily symmetric.

According to ^{[Borovkova1999]}, introducing a lower cutoff `εmin`

can make the algorithm more stable (no divergence), this option is given but defaults to zero.

If `X`

comes from a delay coordinates embedding of a timseries `x`

, a recommended value for $\epsilon_\text{max}$ is `std(x)/4`

.

## Kaplan-Yorke Dimension

`ChaosTools.kaplanyorke_dim`

— Function`kaplanyorke_dim(λs::AbstractVector)`

Calculate the Kaplan-Yorke dimension, a.k.a. Lyapunov dimension^{[Kaplan1970]}.

**Description**

The Kaplan-Yorke dimension is simply the point where `cumsum(λs)`

becomes zero (interpolated):

\[ D_{KY} = k + \frac{\sum_{i=1}^k \lambda_i}{|\lambda_{k+1}|},\quad k = \max_j \left[ \sum_{i=1}^j \lambda_i > 0 \right].\]

If the sum of the exponents never becomes negative the function will return the length of the input vector.

Useful in combination with `lyapunovspectrum`

.

Notice that calling this function requires you to pass the Lyapunov exponents in an ordered vector form (largest to smallest). Example:

```
using DynamicalSystems
hen = Systems.henon()
D_kp = kaplanyorke_dim(lyapunovspectrum(hen, 100000))
```

`1.2587002728916798`

- Molteno1993Molteno, T. C. A., Fast O(N) box-counting algorithm for estimating dimensions. Phys. Rev. E 48, R3263(R) (1993)
- Grassberger2007Peter Grassberger (2007) Grassberger-Procaccia algorithm. Scholarpedia, 2(5):3043.
- Kantz2003Kantz, H., & Schreiber, T. (2003). Nonlinear Time Series Analysis, Cambridge University Press.
- Grassberger1983Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983)
- Theiler1986Theiler, Spurious dimension from correlation algorithms applied to limited time-series data. Physical Review A, 34
- Theiler1987Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36
- Theiler1987Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36
- Grassberger1983Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983)
- Bueno2007Bueno-Orovio and Pérez-García, Enhanced box and prism assisted algorithms for computing the correlation dimension. Chaos Solitons & Fractrals, 34(5)
- Theiler1987Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36
- Grassberger1983Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983)
- Grassberger1983Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983)
- Grassberger1988Peter Grassberger (1988) Finite sample Corrections to Entropy and Dimension Estimates, Physics Letters A 128(6-7)
- Takens1985Takens, On the numerical determination of the dimension of an attractor, in: B.H.W. Braaksma, B.L.J.F. Takens (Eds.), Dynamical Systems and Bifurcations, in: Lecture Notes in Mathematics, Springer, Berlin, 1985, pp. 99–106.
- Theiler1988Theiler, Lacunarity in a best estimator of fractal dimension. Physics Letters A, 133(4–5)
- Borovkova1999Borovkova et al., Consistency of the Takens estimator for the correlation dimension. The Annals of Applied Probability, 9, 05 1999.
- BarlowBarlow, R., Statistics - A Guide to the Use of Statistical Methods in the Physical Sciences. Vol 29. John Wiley & Sons, 1993
- Kaplan1970J. Kaplan & J. Yorke,
*Chaotic behavior of multidimensional difference equations*, Lecture Notes in Mathematics vol.**730**, Springer (1979)