Nonlinear Timeseries Analysis
Neighborhoods in a Dataset
Incorporating the excellent performance of NearestNeighbors.jl and the flexibility of AbstractDataset
allows us to define a function that calculates a "neighborhood" of a given point, i.e. other points near it. The different "types" of the neighborhoods are subtypes of AbstractNeighborhood
.
#ChaosTools.neighborhood
— Function.
neighborhood(n, point, tree::KDTree, method::AbstractNeighborhood)
Return a vector of indices which are the neighborhood of point
, whose index in the original data is n
.
If the original data is data <: AbstractDataset
, then use tree = KDTree(data)
to obtain the tree
instance (which also contains a copy of the data). Both point
and n
must be provided because the tree
has indices in different sorting.
The method
can be a subtype of AbstractNeighborhood
.
neighborhood
works for any subtype of AbstractDataset
, for example
R = some_dataset tree = KDTree(R) neigh = neighborhood(n, R[n], tree, method)
References
neighborhood
simply interfaces the functions knn
and inrange
from NearestNeighbors.jl by using the last argument, method
.
#ChaosTools.AbstractNeighborhood
— Type.
AbstractNeighborhood
Supertype of methods for deciding the neighborhood of points for a given point.
Concrete subtypes:
FixedMassNeighborhood(K::Int)
: The neighborhood of a point consists of theK
nearest neighbors of the point.FixedSizeNeighborhood(ε::Real)
: The neighborhood of a point consists of all neighbors that have distance <ε
from the point.
Notice that these distances are always computed using the Euclidean()
distance in D
-dimensional space.
See also neighborhood
or numericallyapunov
.
Delay Coordinates Reconstruction
A timeseries recorded in some manner from a dynamical system can be used to gain information about the dynamics of the entire phase-space of the system. This can be done by reconstructing a new phase-space from the timeseries. One method that can do this is what is known as delay coordinates embedding.
This is done through the Reconstruction
interface:
#DynamicalSystemsBase.Reconstruction
— Type.
Reconstruction{D, T, τ} <: AbstractDataset{D, T}
D
-dimensional delay-coordinates reconstruction object with delay τ
, created from a timeseries s
with T
type numbers.
Use Reconstruction(s::AbstractVector{T}, D, τ)
to create an instance.
Description
The nth row of a Reconstruction
is the D
-dimensional vector
The reconstruction object R
can have same invariant quantities (like e.g. lyapunov exponents) with the original system that the timeseries were recorded from, for proper D
and τ
[1, 2].
R
can be accessed similarly to a Dataset
:
s = rand(1e6) R = Reconstruction(s, 4, 1) # dimension 4 and delay 1 R[3] # third point of reconstruction, ≡ (s[3], s[4], s[5], s[6]) R[1, 2] # Second element of first point of reconstruction, ≡ s[2]
and can also be given to all functions that accept a Dataset
(like e.g. generalized_dim
from module ChaosTools
).
The functions dimension(R)
and delay(R)
return D
and τ
respectively.
References
[1] : F. Takens, Detecting Strange Attractors in Turbulence — Dynamical Systems and Turbulence, Lecture Notes in Mathematics 366, Springer (1981)
[2] : T. Sauer et al., J. Stat. Phys. 65, pp 579 (1991)
As an example, let's pass a Reconstruction
into e.g. a method that calculates the attractor dimension:
using DynamicalSystems he = Systems.henon() ts = trajectory(he, 100000) D1 = information_dim(ts) # around 1.20 x = ts[:, 1] # some "recorded" timeseries R = Reconstruction(x, 2, 1) # delay coords. reconstruction R[1] # first point of reconstruction, ≡ (x[1], x[2]) R[:, 2] # Second COLUMN of the reconstruction, ≡ x[2:end] since τ=1 D2 = information_dim(R) #around 1.20 println("D2 - D1 = $(abs(D2- D1))")
D2 - D1 = 0.0423887826453242
The 2 numbers D1
and D2
are very close, but of course I knew before-hand good parameter values for D
and τ
(I cheated, huhu!).
Estimating Reconstruction Parameters
The following functions estimate good values that can be used in Reconstruction
:
#DynamicalSystemsBase.estimate_delay
— Function.
estimate_delay(s) -> τ
Estimate an optimal delay to be used in Reconstruction
, by performing an exponential fit to the abs.(c)
with c
the auto-correlation function of s
. Return the exponential decay time τ
rounded to an integer.
Numerical Lyapunov Exponent
Given any timeseries, one can first obtain a Reconstruction
from it using delay coordinates, and then calculate a maximum Lyapunov exponent for it. This is done with
#ChaosTools.numericallyapunov
— Function.
numericallyapunov(R::Reconstruction, ks; refstates, distance, method)
Return E = [E(k) for k ∈ ks]
, where E(k)
is the average logarithmic distance for nearby states that are evolved in time for k
steps (k
must be integer).
Keyword Arguments
refstates::AbstractVector{Int} = 1:(length(R) - ks[end])
: Vector of indices that notes which states of the reconstruction should be used as "reference states", which means that the algorithm is applied for all state indices contained inrefstates
.method::AbstractNeighborhood = FixedMassNeighborhood(1)
: The method to be used when evaluating the neighborhood of each reference state. SeeAbstractNeighborhood
orneighborhood
for more info.distance::Metric = Cityblock()
: The distance function used in the logarithmic distance of nearby states. The allowed distances areCityblock()
andEuclidean()
. See below for more info.
Description
If the reconstruction exhibits exponential divergence of nearby states, then it should clearly hold
for a well defined region in the k
axis, where \lambda is the approximated maximum Lyapunov exponent. \Delta t is the time between samples in the original timeseries. You can use linear_region
with arguments (ks .* Δt, E)
to identify the slope (= \lambda) immediatelly, assuming you have choosen sufficiently good ks
such that the linear scaling region is bigger than the saturated region.
The algorithm used in this function is due to Parlitz [1], which itself expands upon Kantz [2]. In sort, for each reference state a neighborhood is evaluated. Then, for each point in this neighborhood, the logarithmic distance between reference state and neighborhood state is calculated as the "time" index k
increases. The average of the above over all neighborhood states over all reference states is the returned result.
If the Metric
is Euclidean()
then use the Euclidean distance of the full D
-dimensional points (distance d_E in ref. [1]). If however the Metric
is Cityblock()
, calculate the absolute distance of only the first elements of the m+k
and n+k
points of the reconstruction R
(distance d_F in ref. [1]). Notice that the distances used are defined in the package Distances.jl, but are re-exported here for ease-of-use.
This function assumes that the Theiler window (see [1]) is the same as the delay time, w = \tau.
References
[1] : Skokos, C. H. et al., Chaos Detection and Predictability - Chapter 1 (section 1.3.2), Lecture Notes in Physics 915, Springer (2016)
[2] : Kantz, H., Phys. Lett. A 185, pp 77–87 (1994)
The function numericallyapunov
has a total of 4 different approaches for the algorithmic process, by combining 2 types of distances with 2 types of neighborhoods.
Example of Numerical Lyapunov computation
using DynamicalSystems, PyPlot ds = Systems.henon() data = trajectory(ds, 100000) x = data[:, 1] #fake measurements for the win! ks = 1:20 ℜ = 1:10000 fig = figure(figsize=(10,6)) i = 1 for (i, di) in enumerate([Euclidean(), Cityblock()]) subplot(1, 2, i) i+=1 method = FixedMassNeighborhood(2) title("Distance: $(di)", size = 18) for D in [2, 4, 7] R = Reconstruction(x, D, 1) E = numericallyapunov(R, ks; refstates = ℜ, distance = di, method = method) # The following operation: Δt = 1 λ = linear_region(ks.*Δt, E)[2] # gives the linear slope, i.e. the Lyapunov exponent plot(ks-1, E-E[1], label = "D=$D, λ=$(round(λ, 3))") legend() tight_layout() end end
which gives the result
Bad Time-axis (ks
) length
Large ks
This simply cannot be stressed enough! It is just too easy to overshoot the range at which the exponential expansion region is valid!
Let's revisit the example of the previous section:
ds = Systems.henon() data = trajectory(ds, 100000) x = data[:, 1]
The timeseries of length 100000 could be considered big. A time length of 100 seems very small. Yet it turns out it is way too big! The following
ks = 1:100 R = Reconstruction(x, 2, 1) E = numericallyapunov(R, ks, method = FixedMassNeighborhood(2)) figure() plot(ks-1, E-E[1]) println("Lyappunov: ", linear_region(ks, E)[2])
gives this plot: and prints
Lyapunov: 0.4161...
Notice that even though this value for the Lyapunov exponent is correct, it happened to be correct simply due to the jitter of the saturated region. Since the saturated region is much bigger than the linear scaling region, if it wasn't that jittery the function linear_region
would not give the scaling of the linear region, but instead a slope near 0! (or if you were to give bigger tolerance as a keyword argument)
Case of a Continuous system
The process for continuous systems works identically with discrete, but one must be a bit more thoughtful when choosing parameters. The following example has comments to help the users get familiar with the process:
using DynamicalSystems, PyPlot ds = Systems.lorenz() # Max lyapunov is around 0.90 # create a timeseries of 1 dimension dt = 0.05 x = trajectory(ds, 1000.0; dt = dt)[:, 1] τ1 = estimate_delay(x) #gives 7 # Reconstruct it figure() for D in [4, 8], τ in [τ1, 15] R = Reconstruction(x, D, τ) # I now know that I have to use much bigger ks than 1:20, because this is a # continuous case! (See reference given in `numericallyapunovs`) ks1 = 0:200 # I also know that I do not need that dense computations, since 1 increment # in k means increment of 0.05 real time ks2 = 0:4:200 # Calculate lyapunovs: method = FixedMassNeighborhood(5) #5 nearest neighbors of each state # E1 = numericallyapunov(R, ks1; method = method) # λ1 = linear_region(ks1 .* dt, E1)[2] E2 = numericallyapunov(R, ks2; method = method) λ2 = linear_region(ks2 .* dt, E2)[2] # plot(ks1,E1.-E1[1], label = "dense, D=$(D), τ=$(τ), λ=$(round(λ1, 3))") plot(ks2,E2.-E2[1], label = "D=$(D), τ=$(τ), λ=$(round(λ2, 3))") end legend() xlabel("k (0.05×t)") ylabel("E - E(0)") title("Continuous Reconstruction Lyapunov") tight_layout()
which produces: As you can see, using
τ = 15
makes almost no sense! The estimates with τ = 7
though are very good (the actual value is around λ ≈ 0.89...
).