# Fractal Dimension

There are numerous methods that one can use to calculate a so-called "dimension" of a dataset which in the context of dynamical systems is called the Fractal dimension. Several variants of a computationally feasible fractal dimension exist.

## Generalized dimension

Based on the definition of the Generalized entropy, one can calculate an appropriate dimension, called *generalized dimension*:

`ChaosTools.generalized_dim`

— Function`generalized_dim(dataset [, sizes]; q = 1, base = MathConstants.e) -> D_α`

Return the `α`

order generalized dimension of the `dataset`

, by calculating the `genentropy`

for each `ε ∈ sizes`

.

The case of `α=0`

is often called "capacity" or "box-counting" dimension.

**Description**

The returned dimension is approximated by the (inverse) power law exponent of the scaling of the `genentropy`

versus the box size `ε`

, where `ε ∈ sizes`

.

Calling this function performs a lot of automated steps:

- A vector of box sizes is decided by calling
`sizes = estimate_boxsizes(dataset)`

, if`sizes`

is not given. - For each element of
`sizes`

the appropriate entropy is calculated, through`h = genentropy.(Ref(dataset), sizes; α, base)`

. Let`x = -log.(sizes)`

. - The curve
`h(x)`

is decomposed into linear regions, using`linear_regions`

`(x, h)`

. - The biggest linear region is chosen, and a fit for the slope of that region is performed using the function
`linear_region`

, which does a simple linear regression fit using`linreg`

. This slope is the return value of`generalized_dim`

.

By doing these steps one by one yourself, you can adjust the keyword arguments given to each of these function calls, refining the accuracy of the result.

The following aliases are provided:

- α = 0 :
`boxcounting_dim`

,`capacity_dim`

- α = 1 :
`information_dim`

As stated clearly by the documentation string, calling `generalized_dim`

performs a lot of automated steps by calling other functions (see below) with default arguments. It is actually more like a convenient bundle than an actual function and therefore you should be careful when considering the validity of the returned number.

### Example

For an example of using entropies to compute the dimension of an attractor let's use everyone's favorite system:

```
using DynamicalSystems, PyPlot
lor = Systems.lorenz()
```

3-dimensional continuous dynamical system state: [0.0, 10.0, 0.0] e.o.m.: loop in-place? false jacobian: loop_jac parameters: [10.0, 28.0, 2.6666666666666665]

Our goal is to compute entropies for many different partition sizes `ε`

, so let's get down to it:

```
tr = trajectory(lor, 100.0; Ttr = 10.0)
ες = ℯ .^ (-3.5:0.5:3.5) # semi-random guess
Hs = genentropy.(Ref(tr), ες; q = 1)
```

15-element Array{Float64,1}: 9.210440366976329 9.208499748932574 9.195140334192946 9.136445496766951 8.998905310194775 8.705004397152349 8.1305312663553 7.331853533923084 6.4086579132355626 5.453352354803625 4.485476740802796 3.5260626238745894 2.606754871093233 1.8789633825490644 0.5375831514462661

```
xs = @. -log(ες)
figure()
plot(xs, Hs)
ylabel("\$H_1\$")
xlabel("\$-\\log (\\epsilon)\$");
```

The slope of the linear scaling region of the above plot is the generalized dimension (of order q = 2) for the attractor of the Lorenz system.

Given that we *see* the plot, we can estimate where the linear scaling region starts and ends. However, we can use the function `linear_region`

to get an estimate of the result as well. First let's visualize what it does:

```
lrs, slopes = linear_regions(xs, Hs, tol = 0.25)
figure()
for i in 1:length(lrs)-1
plot(xs[lrs[i]:lrs[i+1]], Hs[lrs[i]:lrs[i+1]], marker = "o")
end
ylabel("\$H_1\$")
xlabel("\$-\\log (\\epsilon)\$");
```

The `linear_region`

function computes the slope of the largest region:

`linear_region(xs, Hs)[2]`

1.833384047211349

This result is an approximation of the information dimension (because we used `q = 1`

) of the Lorenz attractor.

The above pipeline is bundled in `generalized_dim`

. For example, the dimension of the strange attractor of the `Systems.henon`

map, following the above approach but taking automated steps, is:

```
using DynamicalSystems
hen = Systems.henon()
tr = trajectory(hen, 200000)
D_hen = generalized_dim(tr; q = 1)
```

1.2276426984800552

As a side note, be sure that you have enough data points, otherwise the values you will get will never be correct, as is demonstrated by J.-P. Eckmann and D. Ruelle (see Physica D **56**, pp 185-187 (1992)).

## Linear scaling regions

And other utilities, especially `linreg`

, used in both [`generalized_dim`

] and `grassberger`

.

`ChaosTools.linear_regions`

— Function`linear_regions(x, y; dxi::Int = 1, tol = 0.25) -> (lrs, tangents)`

Identify regions where the curve `y(x)`

is linear, by scanning the `x`

-axis every `dxi`

indices sequentially (e.g. at `x[1] to x[5], x[5] to x[10], x[10] to x[15]`

and so on if `dxi=5`

).

If the slope (calculated via linear regression) of a region of width `dxi`

is approximatelly equal to that of the previous region, within tolerance `tol`

, then these two regions belong to the same linear region.

Return the indices of `x`

that correspond to linear regions, `lrs`

, and the *correct* `tangents`

at each region (obtained via a second linear regression at each accumulated region).

`ChaosTools.linear_region`

— Function`linear_region(x, y; kwargs...) -> ((ind1, ind2), slope)`

Call `linear_regions`

and identify and return the largest linear region and its slope. The region starts and stops at `x[ind1:ind2]`

.

The keywords `dxi, tol`

are propagated as-is to `linear_regions`

. The keyword `ignore_saturation = true`

ignores saturation that (typically) happens at the final points of the curve `y(x)`

, where the curve flattens out.

`ChaosTools.linreg`

— Function`linreg(x, y) -> a, b`

Perform a linear regression to find the best coefficients so that the curve: `z = a + b*x`

has the least squared error with `y`

.

`ChaosTools.estimate_boxsizes`

— Function`estimate_boxsizes(A::Dataset; kwargs...)`

Return `k`

exponentially spaced values: `base .^ range(lower + w, upper + z; length = k)`

, that are a good estimate for sizes ε that are used in calculating a Fractal Dimension.

Let `d₋`

be the minimum pair-wise distance in `A`

and `d₊`

the length of the diagonal of the hypercube that contains `A`

. Then `lower = log(base, d₋)`

and `upper = log(base, d₊)`

. Because by default `w=1, z=-1`

, the returned sizes are an order of mangitude larger than the minimum distance, and an order of magnitude smaller than the maximum distance.

**Keywords**

`w = 1, z = -1, k = 20`

: as explained above.`base = MathConstants.e`

: the base used in the`log`

function.

## Correlation dimension

`ChaosTools.kernelprob`

— Function`kernelprob(X, ε; norm = Euclidean()) → p::Probabilities`

Associate each point in `X`

(`Dataset`

or timesries) with a probability `p`

using the "kernel estimation" (also called "nearest neighbor kernel estimation" and other names):

\[p_j = \frac{1}{N}\sum_{i=1}^N B(||X_i - X_j|| < \epsilon)\]

where $N$ is its length and $B$ gives 1 if the argument is `true`

.

See also `genentropy`

and `correlationsum`

. `kernelprob`

is equivalent with `probabilities(X, NaiveKernel(ε, TreeDistance(norm)))`

.

`ChaosTools.correlationsum`

— Function`correlationsum(X, ε::Real; w = 0, norm = Euclidean(), q = 2) → C_q(ε)`

Calculate the `q`

-order correlation sum of `X`

(`Dataset`

or timeseries) for a given radius `ε`

and `norm`

, using the formula:

\[C_2(\epsilon) = \frac{2}{(N-w)(N-w-1)}\sum_{i=1}^{N}\sum_{j=1+w+i}^{N} B(||X_i - X_j|| < \epsilon)\]

for `q=2`

and

\[C_q(\epsilon) = \frac{1}{(N-2w)(N-2w-1)^{(q-1)}} \sum_{i=1}^N\left[\sum_{j:|i-j| > w} B(||X_i - X_j|| < \epsilon)\right]^{q-1}\]

for `q≠2`

, where $N$ is its length and $B$ gives 1 if the argument is `true`

. `w`

is the Theiler window. If `ε`

is a vector its values have to be ordered. See the book "Nonlinear Time Series Analysis"^{[Kantz2003]}, Ch. 6, for a discussion around `w`

and choosing best values and Ch. 11.3 for the definition of the q-order correlationsum.

`correlationsum(X, εs::AbstractVector; w, norm, q) → C_q(ε)`

If `εs`

is a vector, `C_q`

is calculated for each `ε ∈ εs`

. If also `q=2`

, some strong optimizations are done, but this requires the allocation a matrix of size `N×N`

. If this is larger than your available memory please use instead:

`[correlationsum(..., ε) for ε in εs]`

See `grassberger`

for more. See also `takens_best_estimate`

.

`ChaosTools.grassberger`

— Function`grassberger(data, εs = estimate_boxsizes(data); kwargs...) → D_C`

Use the method of Grassberger and Proccacia^{[Grassberger1983]}, and the correction by Theiler^{[Theiler1986]}, to estimate the correlation dimension `D_C`

of the given `data`

.

This function does something extrely simple:

```
cm = correlationsum(data, εs; kwargs...)
return linear_region(log.(sizes), log(cm))[2]
```

i.e. it calculates `correlationsum`

for various radii and then tries to find a linear region in the plot of the log of the correlation sum versus log(ε). See `generalized_dim`

for a more thorough explanation.

See also `takens_best_estimate`

.

`ChaosTools.takens_best_estimate`

— Function`takens_best_estimate(X, εmax, metric = Chebyshev(),εmin = 0) → D_C, D_C_95u, D_C_95l`

Use the so-called "Takens' best estimate" ^{[Takens1985]}^{[Theiler1988]} method for estimating the correlation dimension `D_C`

and the upper (`D_C_95u`

) and lower (`D_C_95l`

) confidence limit for the given dataset `X`

.

The original formula is

\[D_C \approx \frac{C(\epsilon_\text{max})}{\int_0^{\epsilon_\text{max}}(C(\epsilon) / \epsilon) \, d\epsilon}\]

where $C$ is the `correlationsum`

and $\epsilon_\text{max}$ is an upper cutoff. Here we use the later expression

\[D_C \approx - \frac{1}{\eta},\quad \eta = \frac{1}{(N-1)^*}\sum_{[i, j]^*}\log(||X_i - X_j|| / \epsilon_\text{max})\]

where the sum happens for all $i, j$ so that $i < j$ and $||X_i - X_j|| < \epsilon_\text{max}$. In the above expression, the bias in the original paper has already been corrected, as suggested in ^{[Borovkova1999]}.

The confidence limits are estimated from the log-likelihood function by finding the values of `D_C`

where the function has fallen by 2 from its maximum, see e.g. ^{[Barlow]} chapter 5.3 Because the CLT does not apply (no independent measurements), the limits are not neccesarily symmetric.

According to ^{[Borovkova1999]}, introducing a lower cutoff `εmin`

can make the algorithm more stable (no divergence), this option is given but defaults to zero.

If `X`

comes from a delay coordinates embedding of a timseries `x`

, a recommended value for $\epsilon_\text{max}$ is `std(x)/4`

.

## Kaplan-Yorke Dimension

`ChaosTools.kaplanyorke_dim`

— Function`kaplanyorke_dim(λs::AbstractVector)`

Calculate the Kaplan-Yorke dimension, a.k.a. Lyapunov dimension^{[Kaplan1970]}.

**Description**

The Kaplan-Yorke dimension is simply the point where `cumsum(λs)`

becomes zero (interpolated):

\[ D_{KY} = k + \frac{\sum_{i=1}^k \lambda_i}{|\lambda_{k+1}|},\quad k = \max_j \left[ \sum_{i=1}^j \lambda_i > 0 \right].\]

If the sum of the exponents never becomes negative the function will return the length of the input vector.

Useful in combination with `lyapunovspectrum`

.

Notice that calling this function requires you to pass the Lyapunov exponents in an ordered vector form (largest to smallest). Example:

```
using DynamicalSystems
hen = Systems.henon()
D_kp = kaplanyorke_dim(lyapunovspectrum(hen, 100000))
```

1.2587089939604255

- KantzKantz, H., & Schreiber, T. (2003). More about invariant quantities. In Nonlinear Time Series Analysis (pp. 197-233). Cambridge: Cambridge University Press.
- Grassberger1983Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983)
- Theiler1986Theiler, Spurious dimension from correlation algorithms applied to limited time-series data. Physical Review A, 34
- Takens1985Takens, On the numerical determination of the dimension of an attractor, in: B.H.W. Braaksma, B.L.J.F. Takens (Eds.), Dynamical Systems and Bifurcations, in: Lecture Notes in Mathematics, Springer, Berlin, 1985, pp. 99–106.
- Theiler1988Theiler, Lacunarity in a best estimator of fractal dimension. Physics Letters A, 133(4–5)
- Borovkova1999Borovkova et al., Consistency of the Takens estimator for the correlation dimension. The Annals of Applied Probability, 9, 05 1999.
- BarlowBarlow, R., Statistics - A Guide to the Use of Statistical Methods in the Physical Sciences. Vol 29. John Wiley & Sons, 1993
- Kaplan1970J. Kaplan & J. Yorke,
*Chaotic behavior of multidimensional difference equations*, Lecture Notes in Mathematics vol.**730**, Springer (1979)