Fractal Dimension
There are numerous methods that one can use to calculate a so-called "dimension" of a dataset which in the context of dynamical systems is called the Fractal dimension. Several variants of a computationally feasible fractal dimension exist and a simple usage example is shown in the Fractal dimension example subsection.
Generalized dimension
Based on the definition of the Generalized entropy (genentropy
), one can calculate an appropriate dimension, called generalized dimension:
ChaosTools.generalized_dim
— Functiongeneralized_dim(dataset [, sizes]; q = 1, base = MathConstants.e) -> Δ_q
Return the q
order generalized dimension of the dataset
, by calculating the genentropy
for each ε ∈ sizes
.
The case of q = 0
is often called "capacity" or "box-counting" dimension, while q = 1
is the "information" dimension.
Description
The returned dimension is approximated by the (inverse) power law exponent of the scaling of the genentropy
$H_q$ versus the box size ε
, where ε ∈ sizes
:
\[H_q \sim -\Delta_q\log(\varepsilon)\]
Calling this function performs a lot of automated steps:
- A vector of box sizes is decided by calling
sizes = estimate_boxsizes(dataset)
, ifsizes
is not given. - For each element of
sizes
the appropriate entropy is calculated, throughH = genentropy.(Ref(dataset), sizes; q, base)
. Letx = -log.(sizes)
. - The curve
H(x)
is decomposed into linear regions, usinglinear_regions
(x, h)
. - The biggest linear region is chosen, and a fit for the slope of that region is performed using the function
linear_region
, which does a simple linear regression fit usinglinreg
. This slope is the return value ofgeneralized_dim
.
By doing these steps one by one yourself, you can adjust the keyword arguments given to each of these function calls, refining the accuracy of the result.
ChaosTools.molteno_dim
— Functionmolteno_dim(data::Dataset; k0 = 10, q = 1.0, base = ℯ)
Calculate the generalized dimension using the algorithm for box division defined by Molteno[Molteno1993].
Description
Divide the data into boxes with each new box having half the side length of the former box using molteno_boxing
. Break if the number of points over the number of filled boxes falls below k0
. Then the generalized dimension can be calculated by using genentropy
to calculate the sum over the logarithm also considering possible approximations and fitting this to the logarithm of one over the boxsize using linear_region
.
This algorithm is faster than the traditional approach of using probabilities
, but it is only suited for low dimensional data since it divides each box into 2^D
new boxes if D
is the dimension. For large D
this leads to low numbers of box divisions before the threshold is passed and the divison stops. This results to a low number of data points to fit the dimension to and thereby a poor estimate.
ChaosTools.molteno_boxing
— Functionmolteno_boxing(data::Dataset; k0 = 10) → (probs, εs)
Distribute the data
into boxes whose size is halved in each step. Stop if the average number of points per filled box falls below the threshold k0
.
Return probs
, a vector of Propabilities
for different box sizes and the corresponding box sizes εs
.
Description
Project the data
onto the whole interval of numbers that is covered by UInt64
. This projected data is distributed into boxes whose size decreases by factor 2 in each step. For each box that contains more than one point 2^D
new boxes are created where D
is the dimension of the data.
The process of dividing the data into new boxes stops when the number of points over the number of filled boxes falls below k0
. The box sizes εs
are calculated and returned together with the probs
.
See[Molteno1993] for more.
As stated clearly by the documentation string, calling generalized_dim
performs a lot of automated steps by calling other functions (see below) with default arguments. It is actually more like a convenient bundle than an actual function and therefore you should be careful when considering the validity of the returned number.
Fractal dimension example
For an example of using entropies to compute the dimension of an attractor let's use everyone's favorite system:
using DynamicalSystems, CairoMakie
lor = Systems.lorenz()
3-dimensional continuous dynamical system
state: [0.0, 10.0, 0.0]
rule f: loop
in-place? false
jacobian: loop_jac
parameters: [10.0, 28.0, 2.66667]
Our goal is to compute entropies for many different partition sizes ε
, so let's get down to it:
tr = trajectory(lor, 100.0; Ttr = 10.0)
ες = ℯ .^ (-3.5:0.5:3.5) # semi-random guess
Hs = genentropy.(Ref(tr), ες; q = 1)
15-element Vector{Float64}:
9.210301751401776
9.208083902208905
9.194915422626233
9.137994031131077
8.990887340520297
8.6852212308104
8.121043612220996
7.310407416022282
6.390042566943293
5.417697229541516
4.450002238120835
3.511858839530932
2.5871118116002414
1.8590020089939434
0.5001745970318286
xs = @. -log(ες)
fig = Figure(resolution = (500,500))
ax = Axis(fig[1,1]; ylabel = L"H_1", xlabel = L"-\log (\epsilon)")
lines!(ax, xs, Hs)
fig
The slope of the linear scaling region of the above plot is the generalized dimension (of order q = 1) for the attractor of the Lorenz system.
Given that we see the plot, we can estimate where the linear scaling region starts and ends. However, we can use the function linear_region
to get an estimate of the result as well. First let's visualize what it does:
lrs, slopes = linear_regions(xs, Hs, tol = 0.25)
fig = Figure(resolution = (500,500))
ax = Axis(fig[1,1]; ylabel = L"H_1", xlabel = L"-\log (\epsilon)")
for i in 1:length(lrs)-1
scatterlines!(ax, xs[lrs[i]:lrs[i+1]], Hs[lrs[i]:lrs[i+1]])
end
fig
The linear_region
function computes the slope of the largest region:
Δ = linear_region(xs, Hs)[2]
1.8345956051989842
This result is an approximation of the information dimension (because we used q = 1
) of the Lorenz attractor.
The above pipeline is bundled in generalized_dim
. For example, the dimension of the strange attractor of the Systems.henon
map, following the above approach but taking automated steps, is:
using DynamicalSystems
hen = Systems.henon()
tr = trajectory(hen, 200000)
D_hen = generalized_dim(tr; q = 1)
1.221186840581862
As a side note, be sure that you have enough data points, otherwise the values you will get will never be correct, as is demonstrated by J.-P. Eckmann and D. Ruelle (see Physica D 56, pp 185-187 (1992)).
Linear scaling regions
And other utilities, especially linreg
, used in both [generalized_dim
] and grassberger_dim
.
ChaosTools.linear_regions
— Functionlinear_regions(x, y; dxi::Int = 1, tol = 0.25) -> (lrs, tangents)
Identify regions where the curve y(x)
is linear, by scanning the x
-axis every dxi
indices sequentially (e.g. at x[1] to x[5], x[5] to x[10], x[10] to x[15]
and so on if dxi=5
).
If the slope (calculated via linear regression) of a region of width dxi
is approximatelly equal to that of the previous region, within tolerance tol
, then these two regions belong to the same linear region.
Return the indices of x
that correspond to linear regions, lrs
, and the correct tangents
at each region (obtained via a second linear regression at each accumulated region).
ChaosTools.linear_region
— Functionlinear_region(x, y; kwargs...) -> ((ind1, ind2), slope)
Call linear_regions
and identify and return the largest linear region and its slope. The region starts and stops at x[ind1:ind2]
.
The keywords dxi, tol
are propagated as-is to linear_regions
. The keyword ignore_saturation = true
ignores saturation that (sometimes) happens at the start and end of the curve y(x)
, where the curve flattens. The keyword sat = 0.01
decides what saturation is (while abs(y[i]-y[i+1])<sat
we are in a saturation regime).
The keyword warning = true
prints a warning if the linear region is less than 1/3 of the available x-axis.
ChaosTools.linreg
— Functionlinreg(x, y) -> a, b
Perform a linear regression to find the best coefficients so that the curve: z = a + b*x
has the least squared error with y
.
ChaosTools.estimate_boxsizes
— Functionestimate_boxsizes(A::Dataset; kwargs...) → εs
Return k
exponentially spaced values: εs = base .^ range(lower + w, upper + z; length = k)
, that are a good estimate for sizes ε that are used in calculating a Fractal Dimension. It is strongly recommended to regularize
input dataset A
before using this function.
Let d₋
be the minimum pair-wise distance in A
and d₊
the average total length of A
along each of the dimensions of A
. Then lower = log(base, d₋)
and upper = log(base, d₊)
. Because by default w=1, z=-1
, the returned sizes are an order of mangitude larger than the minimum distance, and an order of magnitude smaller than the maximum distance.
Keywords
w = 1, z = -1, k = 20
: as explained above.base = MathConstants.e
: the base used in thelog
function.warning = true
: Print some warnings for bad estimates.autoexpand = true
: If the final estimated range does not cover at least 2 orders of magnitude, it is automatically expanded by settingw -= we
andz -= ze
. You can set different default values to the keywordswe = w, ze = z
.
Correlation sum based dimension
ChaosTools.correlationsum
— Functioncorrelationsum(X, ε; w = 0, norm = Euclidean(), q = 2) → C_q(ε)
Calculate the q
-order correlation sum of X
(Dataset
or timeseries) for a given radius ε
and norm
. They keyword show_progress = true
can be used to display a progress bar for large X
.
correlationsum(X, εs::AbstractVector; w, norm, q) → C_q(ε)
If εs
is a vector, C_q
is calculated for each ε ∈ εs
more efficiently. If also q=2
, we attempt to do further optimizations, if the allocation a matrix of size N×N
is possible.
The function boxed_correlationsum
is faster and should be preferred over this one.
Description
The correlation sum is defined as follows for q=2
:
\[C_2(\epsilon) = \frac{2}{(N-w)(N-w-1)}\sum_{i=1}^{N}\sum_{j=1+w+i}^{N} B(||X_i - X_j|| < \epsilon)\]
for as follows for q≠2
\[C_q(\epsilon) = \left[\frac{1}{\alpha} \sum_{i=w+1}^{N-w} \left[\sum_{j:|i-j| > w} B(||X_i - X_j|| < \epsilon)\right]^{q-1}\right]^{1/(q-1)}\]
where
\[\alpha = (N-2w)(N-2w-1)^{(q-1)}\]
with $N$ the length of X
and $B$ gives 1 if its argument is true
. w
is the Theiler window. See the article of Grassberger for the general definition [Grassberger2007] and the book "Nonlinear Time Series Analysis" [Kantz2003], Ch. 6, for a discussion around choosing best values for w
, and Ch. 11.3 for the explicit definition of the q-order correlationsum.
The scaling of $\log C_q$ versus $\log \epsilon$ approximates the q-order generalized (Rényi) dimension.
ChaosTools.grassberger_dim
— Functiongrassberger_dim(data, εs = estimate_boxsizes(data); kwargs...) → D_C
Use the method of Grassberger and Proccacia[Grassberger1983], and the correction by Theiler[Theiler1986], to estimate the correlation dimension D_C
of the given data
.
This function does something extremely simple:
cm = correlationsum(data, εs; kwargs...)
return linear_region(log.(sizes), log(cm))[2]
i.e. it calculates correlationsum
for various radii and then tries to find a linear region in the plot of the log of the correlation sum versus log(ε). See generalized_dim
for a more thorough explanation.
See also takens_best_estimate
.
ChaosTools.boxed_correlationsum
— Functionboxed_correlationsum(X::Dataset, εs, r0 = maximum(εs); kwargs...) → Cs
Estimate the box assisted q-order correlation sum Cs
out of a dataset X
for each radius in εs
, by splitting the data into boxes of size r0
beforehand. This method is much faster than correlationsum
, provided that the box size r0
is significantly smaller than then the attractor length. A good estimate for r0
is estimate_r0_buenoorovio
.
See correlationsum
for the definition of the correlation sum.
boxed_correlationsum(X::Dataset; kwargs...) → εs, Cs
In this method the minimum inter-point distance and estimate_r0_buenoorovio
are used to estimate good εs
for the calculation, which are also returned.
Keywords
q = 2
: The order of the correlation sum.P = autoprismdim(data)
: The prism dimension.w = 0
: The Theiler window.show_progress = false
: Whether to display a progress bar for the calculation.
Description
C_q(ε)
is calculated for every ε ∈ εs
and each of the boxes to then be summed up afterwards. The method of splitting the data into boxes was implemented according to Theiler[Theiler1987]. w
is the Theiler window. P
is the prism dimension. If P
is unequal to the dimension of the data, only the first P
dimensions are considered for the box distribution (this is called the prism-assisted version). By default P
is choosen automatically.
The function is explicitly optimized for q = 2
but becomes quite slow for q ≠ 2
.
See correlationsum
for the definition of C_q
and also data_boxing
to use the algorithm that splits data into boxes.
ChaosTools.data_boxing
— Functiondata_boxing(data, r0, P) → boxes, contents
Distribute the data
points into boxes of size r0
. Return box positions and the contents of each box as two separate vectors. Implemented according to the paper by Theiler[Theiler1987] improving the algorithm by Grassberger and Procaccia[Grassberger1983]. If P
is smaller than the dimension of the data, only the first P
dimensions are considered for the distribution into boxes.
See also: boxed_correlationsum
.
ChaosTools.estimate_r0_buenoorovio
— Functionestimate_r0_buenoorovio(X::Dataset, P = autoprismdim(X))
Estimates a reasonable size for boxing the time series X
proposed by Bueno-Orovio and Pérez-García[Bueno2007] before calculating the correlation dimension as presented by Theiler[Theiler1983]. If instead of boxes, prisms are chosen everything stays the same but P
is the dimension of the prism. To do so the dimension ν
is estimated by running the algorithm by Grassberger and Procaccia[Grassberger1983] with √N
points where N
is the number of total data points. An effective size ℓ
of the attractor is calculated by boxing a small subset of size N/10
into boxes of sidelength r_ℓ
and counting the number of filled boxes η_ℓ
.
\[\ell = r_\ell \eta_\ell ^{1/\nu}\]
The optimal number of filled boxes η_opt
is calculated by minimising the number of calculations.
\[\eta_\textrm{opt} = N^{2/3}\cdot \frac{3^\nu - 1}{3^P - 1}^{1/2}.\]
P
is the dimension of the data or the number of edges on the prism that don't span the whole dataset.
Then the optimal boxsize $r_0$ computes as
\[r_0 = \ell / \eta_\textrm{opt}^{1/\nu}.\]
ChaosTools.estimate_r0_theiler
— Functionestimate_r0_theiler(X::Dataset) → r0, ε0
Estimate a reasonable size for boxing the data X
before calculating the boxed_correlationsum
proposed by Theiler[Theiler1987]. Return the boxing size r0
and minimum inter-point distance in X
, ε0
.
To do so the dimension is estimated by running the algorithm by Grassberger and Procaccia[Grassberger1983] with √N
points where N
is the number of total data points. Then the optimal boxsize $r_0$ computes as
\[r_0 = R (2/N)^{1/\nu}\]
where $R$ is the size of the chaotic attractor and $\nu$ is the estimated dimension.
ChaosTools.correlationsum_fixedmass
— Functioncorrelationsum_fixedmass(data, max_j; metric = Euclidean(), M = length(data)) → rs, ys
A fixed mass algorithm for the calculation of a fractal dimension $\Delta$ with max_j
the maximum number of neighbours that should be considered for the calculation. M
defines the number of points considered for the averaging of distances.
Description
The calculated $\Delta$ approximates the information dimension. The implementation here is due to to [Grassberger1988], which defines
\[\Delta \times \overline{\log r^{(j)}} \sim Ψ(j) - \log N\]
where $\Psi(j) = \frac{\text{d} \log Γ(j)}{\text{d} j}$ is the digamma function, rs
= $\overline{\log r^{(j)}}$ and ys
= $\Psi(j) - \log N$ ($N$ is the length of the data).
$\Delta$ can be computed by using linear_region(rs, ys)
.
Takens' best estimate
ChaosTools.takens_best_estimate
— Functiontakens_best_estimate(X, εmax, metric = Chebyshev(),εmin = 0) → D_C, D_C_95u, D_C_95l
Use the so-called "Takens' best estimate" [Takens1985][Theiler1988] method for estimating the correlation dimension D_C
and the upper (D_C_95u
) and lower (D_C_95l
) confidence limit for the given dataset X
.
The original formula is
\[D_C \approx \frac{C(\epsilon_\text{max})}{\int_0^{\epsilon_\text{max}}(C(\epsilon) / \epsilon) \, d\epsilon}\]
where $C$ is the correlationsum
and $\epsilon_\text{max}$ is an upper cutoff. Here we use the later expression
\[D_C \approx - \frac{1}{\eta},\quad \eta = \frac{1}{(N-1)^*}\sum_{[i, j]^*}\log(||X_i - X_j|| / \epsilon_\text{max})\]
where the sum happens for all $i, j$ so that $i < j$ and $||X_i - X_j|| < \epsilon_\text{max}$. In the above expression, the bias in the original paper has already been corrected, as suggested in [Borovkova1999].
The confidence limits are estimated from the log-likelihood function by finding the values of D_C
where the function has fallen by 2 from its maximum, see e.g. [Barlow] chapter 5.3 Because the CLT does not apply (no independent measurements), the limits are not neccesarily symmetric.
According to [Borovkova1999], introducing a lower cutoff εmin
can make the algorithm more stable (no divergence), this option is given but defaults to zero.
If X
comes from a delay coordinates embedding of a timseries x
, a recommended value for $\epsilon_\text{max}$ is std(x)/4
.
Kaplan-Yorke Dimension
ChaosTools.kaplanyorke_dim
— Functionkaplanyorke_dim(λs::AbstractVector)
Calculate the Kaplan-Yorke dimension, a.k.a. Lyapunov dimension[Kaplan1970].
Description
The Kaplan-Yorke dimension is simply the point where cumsum(λs)
becomes zero (interpolated):
\[ D_{KY} = k + \frac{\sum_{i=1}^k \lambda_i}{|\lambda_{k+1}|},\quad k = \max_j \left[ \sum_{i=1}^j \lambda_i > 0 \right].\]
If the sum of the exponents never becomes negative the function will return the length of the input vector.
Useful in combination with lyapunovspectrum
.
Notice that calling this function requires you to pass the Lyapunov exponents in an ordered vector form (largest to smallest). Example:
using DynamicalSystems
hen = Systems.henon()
D_kp = kaplanyorke_dim(lyapunovspectrum(hen, 100000))
1.2586978000453417
- Molteno1993Molteno, T. C. A., Fast O(N) box-counting algorithm for estimating dimensions. Phys. Rev. E 48, R3263(R) (1993)
- Grassberger2007Peter Grassberger (2007) Grassberger-Procaccia algorithm. Scholarpedia, 2(5):3043.
- Kantz2003Kantz, H., & Schreiber, T. (2003). Nonlinear Time Series Analysis, Cambridge University Press.
- Grassberger1983Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983)
- Theiler1986Theiler, Spurious dimension from correlation algorithms applied to limited time-series data. Physical Review A, 34
- Theiler1987Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36
- Theiler1987Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36
- Grassberger1983Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983)
- Bueno2007Bueno-Orovio and Pérez-García, Enhanced box and prism assisted algorithms for computing the correlation dimension. Chaos Solitons & Fractrals, 34(5)
- Theiler1987Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36
- Grassberger1983Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983)
- Theiler1987Theiler, Efficient algorithm for estimating the correlation dimension from a set of discrete points. Physical Review A, 36
- Grassberger1983Grassberger and Proccacia, Characterization of strange attractors, PRL 50 (1983)
- Grassberger1988Peter Grassberger (1988) Finite sample Corrections to Entropy and Dimension Estimates, Physics Letters A 128(6-7)
- Takens1985Takens, On the numerical determination of the dimension of an attractor, in: B.H.W. Braaksma, B.L.J.F. Takens (Eds.), Dynamical Systems and Bifurcations, in: Lecture Notes in Mathematics, Springer, Berlin, 1985, pp. 99–106.
- Theiler1988Theiler, Lacunarity in a best estimator of fractal dimension. Physics Letters A, 133(4–5)
- Borovkova1999Borovkova et al., Consistency of the Takens estimator for the correlation dimension. The Annals of Applied Probability, 9, 05 1999.
- BarlowBarlow, R., Statistics - A Guide to the Use of Statistical Methods in the Physical Sciences. Vol 29. John Wiley & Sons, 1993
- Kaplan1970J. Kaplan & J. Yorke, Chaotic behavior of multidimensional difference equations, Lecture Notes in Mathematics vol. 730, Springer (1979)