Nonlinear Timeseries Analysis

Neighborhoods in a Dataset

Incorporating the excellent performance of NearestNeighbors.jl and the flexibility of AbstractDataset allows us to define a function that calculates a "neighborhood" of a given point, i.e. other points near it. The different "types" of the neighborhoods are subtypes of AbstractNeighborhood.

#ChaosTools.neighborhood — Function.

neighborhood(n, point, tree::KDTree, method::AbstractNeighborhood)

Return a vector of indices which are the neighborhood of point, whose index in the original data is n.

If the original data is data <: AbstractDataset, then use tree = KDTree(data) to obtain the tree instance (which also contains a copy of the data). Both point and n must be provided because the tree has indices in different sorting.

The method can be a subtype of AbstractNeighborhood.

neighborhood works for any subtype of AbstractDataset, for example

R = some_dataset
tree = KDTree(R)
neigh = neighborhood(n, R[n], tree, method)

References

neighborhood simply interfaces the functions knn and inrange from NearestNeighbors.jl by using the last argument, method.

source

#ChaosTools.AbstractNeighborhood — Type.

AbstractNeighborhood

Supertype of methods for deciding the neighborhood of points for a given point.

Concrete subtypes:

FixedMassNeighborhood(K::Int) : The neighborhood of a point consists of the K nearest neighbors of the point.
FixedSizeNeighborhood(ε::Real) : The neighborhood of a point consists of all neighbors that have distance < ε from the point.

Notice that these distances are always computed using the Euclidean() distance in D-dimensional space.

source

Delay Coordinates Reconstruction

A timeseries recorded in some manner from a dynamical system can be used to gain information about the dynamics of the entire phase-space of the system. This can be done by reconstructing a new phase-space from the timeseries. One method that can do this is what is known as delay coordinates embedding.

This is done through the Reconstruction interface:

#DynamicalSystemsBase.Reconstruction — Type.

Reconstruction{D, T, τ} <: AbstractDataset{D, T}

D-dimensional delay-coordinates reconstruction object with delay τ, created from a timeseries s with T type numbers.

Use Reconstruction(s::AbstractVector{T}, D, τ) to create an instance.

Description

The $n$ th row of a Reconstruction is the D-dimensional vector

$(s(n), s(n+\tau), s(n+2\tau), \dots, s(n+(D-1)\tau))$

The reconstruction object R can have same invariant quantities (like e.g. lyapunov exponents) with the original system that the timeseries were recorded from, for proper D and τ [1, 2].

R can be accessed similarly to a Dataset:

s = rand(1e6)
R = Reconstruction(s, 4, 1) # dimension 4 and delay 1
R[3] # third point of reconstruction, ≡ (s[3], s[4], s[5], s[6])
R[1, 2] # Second element of first point of reconstruction, ≡ s[2]

and can also be given to all functions that accept a Dataset (like e.g. generalized_dim from module ChaosTools).

The functions dimension(R) and delay(R) return D and τ respectively.

References

[1] : F. Takens, Detecting Strange Attractors in Turbulence — Dynamical Systems and Turbulence, Lecture Notes in Mathematics 366, Springer (1981)

[2] : T. Sauer et al., J. Stat. Phys. 65, pp 579 (1991)

source

As an example, let's pass a Reconstruction into e.g. a method that calculates the attractor dimension:

using DynamicalSystems
he = Systems.henon()
ts = trajectory(he, 100000)
D1 = information_dim(ts) # around 1.20
x = ts[:, 1] # some "recorded" timeseries
R = Reconstruction(x, 2, 1) # delay coords. reconstruction
R[1] # first point of reconstruction, ≡ (x[1], x[2])
R[:, 2] # Second COLUMN of the reconstruction, ≡ x[2:end] since τ=1
D2 = information_dim(R) #around 1.20
println("D2 - D1 = $(abs(D2- D1))")

D2 - D1 = 0.0423887826453242

The 2 numbers D1 and D2 are very close, but of course I knew before-hand good parameter values for D and τ (I cheated, huhu!).

Estimating Reconstruction Parameters

The following functions estimate good values that can be used in Reconstruction:

#DynamicalSystemsBase.estimate_delay — Function.

estimate_delay(s) -> τ

Estimate an optimal delay to be used in Reconstruction, by performing an exponential fit to the abs.(c) with c the auto-correlation function of s. Return the exponential decay time τ rounded to an integer.

source

Numerical Lyapunov Exponent

Given any timeseries, one can first obtain a Reconstruction from it using delay coordinates, and then calculate a maximum Lyapunov exponent for it. This is done with

#ChaosTools.numericallyapunov — Function.

numericallyapunov(R::Reconstruction, ks;  refstates, distance, method)

Return E = [E(k) for k ∈ ks], where E(k) is the average logarithmic distance for nearby states that are evolved in time for k steps (k must be integer).

Keyword Arguments

refstates::AbstractVector{Int} = 1:(length(R) - ks[end]) : Vector of indices that notes which states of the reconstruction should be used as "reference states", which means that the algorithm is applied for all state indices contained in refstates.
method::AbstractNeighborhood = FixedMassNeighborhood(1) : The method to be used when evaluating the neighborhood of each reference state. See AbstractNeighborhood or neighborhood for more info.
distance::Metric = Cityblock() : The distance function used in the logarithmic distance of nearby states. The allowed distances are Cityblock() and Euclidean(). See below for more info.

Description

If the reconstruction exhibits exponential divergence of nearby states, then it should clearly hold

$E(k) \approx \lambda\Delta t k + E(0)$

for a well defined region in the k axis, where $\lambda$ is the approximated maximum Lyapunov exponent. $\Delta t$ is the time between samples in the original timeseries. You can use linear_region with arguments (ks .* Δt, E) to identify the slope (= $\lambda$ ) immediatelly, assuming you have choosen sufficiently good ks such that the linear scaling region is bigger than the saturated region.

The algorithm used in this function is due to Parlitz [1], which itself expands upon Kantz [2]. In sort, for each reference state a neighborhood is evaluated. Then, for each point in this neighborhood, the logarithmic distance between reference state and neighborhood state is calculated as the "time" index k increases. The average of the above over all neighborhood states over all reference states is the returned result.

If the Metric is Euclidean() then use the Euclidean distance of the full D-dimensional points (distance $d_E$ in ref. [1]). If however the Metric is Cityblock(), calculate the absolute distance of only the first elements of the m+k and n+k points of the reconstruction R(distance $d_F$ in ref. [1]). Notice that the distances used are defined in the package Distances.jl, but are re-exported here for ease-of-use.

This function assumes that the Theiler window (see [1]) is the same as the delay time, $w = \tau$ .

References

[1] : Skokos, C. H. et al., Chaos Detection and Predictability - Chapter 1 (section 1.3.2), Lecture Notes in Physics 915, Springer (2016)

[2] : Kantz, H., Phys. Lett. A 185, pp 77–87 (1994)

source

The function numericallyapunov has a total of 4 different approaches for the algorithmic process, by combining 2 types of distances with 2 types of neighborhoods.

Example of Numerical Lyapunov computation

using DynamicalSystems, PyPlot

ds = Systems.henon()
data = trajectory(ds, 100000)
x = data[:, 1] #fake measurements for the win!

ks = 1:20
ℜ = 1:10000
fig = figure(figsize=(10,6))
i = 1

for (i, di) in enumerate([Euclidean(), Cityblock()])
  subplot(1, 2, i)
  i+=1
  method = FixedMassNeighborhood(2)

  title("Distance: $(di)", size = 18)
  for D in [2, 4, 7]
    R = Reconstruction(x, D, 1)
    E = numericallyapunov(R, ks;
    refstates = ℜ, distance = di, method = method)
    # The following operation:
    Δt = 1
    λ = linear_region(ks.*Δt, E)[2]
    # gives the linear slope, i.e. the Lyapunov exponent
    plot(ks-1, E-E[1], label = "D=$D, λ=$(round(λ, 3))")
    legend()
    tight_layout()
  end


end

which gives the result Lyapunov exponent of a timeseries

Bad Time-axis (`ks`) length

Large ks

This simply cannot be stressed enough! It is just too easy to overshoot the range at which the exponential expansion region is valid!

Let's revisit the example of the previous section:

ds = Systems.henon()
data = trajectory(ds, 100000)
x = data[:, 1]

The timeseries of length 100000 could be considered big. A time length of 100 seems very small. Yet it turns out it is way too big! The following

ks = 1:100
R = Reconstruction(x, 2, 1)
E = numericallyapunov(R, ks, method = FixedMassNeighborhood(2))
figure()
plot(ks-1, E-E[1])
println("Lyappunov: ", linear_region(ks, E)[2])

gives this plot: Bad time-vector example and prints

Lyapunov: 0.4161...

Notice that even though this value for the Lyapunov exponent is correct, it happened to be correct simply due to the jitter of the saturated region. Since the saturated region is much bigger than the linear scaling region, if it wasn't that jittery the function linear_region would not give the scaling of the linear region, but instead a slope near 0! (or if you were to give bigger tolerance as a keyword argument)

Case of a Continuous system

The process for continuous systems works identically with discrete, but one must be a bit more thoughtful when choosing parameters. The following example has comments to help the users get familiar with the process:

using DynamicalSystems, PyPlot

ds = Systems.lorenz() # Max lyapunov is around 0.90
# create a timeseries of 1 dimension
dt = 0.05
x = trajectory(ds, 1000.0; dt = dt)[:, 1]

τ1 = estimate_delay(x) #gives 7

# Reconstruct it
figure()
for D in [4, 8], τ in [τ1, 15]
    R = Reconstruction(x, D, τ)

    # I now know that I have to use much bigger ks than 1:20, because this is a
    # continuous case! (See reference given in `numericallyapunovs`)
    ks1 = 0:200
    # I also know that I do not need that dense computations, since 1 increment
    # in k means increment of 0.05 real time
    ks2 = 0:4:200

    # Calculate lyapunovs:
    method = FixedMassNeighborhood(5) #5 nearest neighbors of each state

    # E1 = numericallyapunov(R, ks1; method = method)
    # λ1 = linear_region(ks1 .* dt, E1)[2]
    E2 = numericallyapunov(R, ks2; method = method)
    λ2 = linear_region(ks2 .* dt, E2)[2]


    # plot(ks1,E1.-E1[1], label = "dense, D=$(D), τ=$(τ), λ=$(round(λ1, 3))")
    plot(ks2,E2.-E2[1], label = "D=$(D), τ=$(τ), λ=$(round(λ2, 3))")
end

legend()
xlabel("k (0.05×t)")
ylabel("E - E(0)")
title("Continuous Reconstruction Lyapunov")
tight_layout()

which produces: Continuous Reconstruction exaple As you can see, using τ = 15 makes almost no sense! The estimates with τ = 7 though are very good (the actual value is around λ ≈ 0.89...).

Nonlinear Timeseries Analysis

Neighborhoods in a Dataset

Delay Coordinates Reconstruction

Estimating Reconstruction Parameters

Numerical Lyapunov Exponent

Example of Numerical Lyapunov computation

Bad Time-axis (ks) length

Case of a Continuous system

Bad Time-axis (`ks`) length