Numerical Data
StateSpaceSets
— ModuleStateSpaceSets.jl
A Julia package that provides functionality for state space sets. These are collections of points of fixed, and known by type, size (called dimension). It is used in several projects in the JuliaDynamics organization, such as DynamicalSystems.jl or CausalityTools.jl.
The main export of StateSpaceSets
is the concrete type StateSpaceSet
. The package also provides functionality for distances, neighbor searches, sampling, and normalization.
To install it you may run import Pkg; Pkg.add("StateSpaceSets")
, however, there is no real reason to install this package directly as it is re-exported by all downstream packages that use it.
previously StateSpaceSets.jl was part of DelayEmbeddings.jl
The word "timeseries" can be confusing, because it can mean a univariate (also called scalar or one-dimensional) timeseries or a multivariate (also called multi-dimensional) timeseries. To resolve this confusion, in DynamicalSystems.jl we have the following convention: "timeseries" is always univariate! it refers to a one-dimensional vector of numbers, which exists with respect to some other one-dimensional vector of numbers that corresponds to a time vector. On the other hand, we use the word "dataset" is used to refer to a multi-dimensional timeseries, which is of course simply a group/set of one-dimensional timeseries represented as a StateSpaceSet
. In some documentation strings we use the word "trajectory" instead of "dataset", which means an ordered multivariate timeseries. This is typically the output of the function trajectory
, or the delay embedding of a timeseries via embed
, both of which are also represented as a StateSpaceSet
.
StateSpaceSet
Trajectories, and in general sets in state space, are represented by a structure called StateSpaceSet
in DynamicalSystems.jl (while timeseries are always standard Julia Vector
s). It is recommended to always standardize
datasets.
StateSpaceSets.StateSpaceSet
— TypeStateSpaceSet{D, T} <: AbstractStateSpaceSet{D,T}
A dedicated interface for sets in a state space. It is an ordered container of equally-sized points of length D
. Each point is represented by SVector{D, T}
. The data are a standard Julia Vector{SVector}
, and can be obtained with vec(ssset::StateSpaceSet)
. Typically the order of points in the set is the time direction, but it doesn't have to be.
When indexed with 1 index, StateSpaceSet
is like a vector of points. When indexed with 2 indices it behaves like a matrix that has each of the columns be the timeseries of each of the variables. When iterated over, it iterates over its contained points. See description of indexing below for more.
StateSpaceSet
also supports almost all sensible vector operations like append!, push!, hcat, eachrow
, among others.
Description of indexing
In the following let i, j
be integers, typeof(X) <: AbstractStateSpaceSet
and v1, v2
be <: AbstractVector{Int}
(v1, v2
could also be ranges, and for performance benefits make v2
an SVector{Int}
).
X[i] == X[i, :]
gives thei
th point (returns anSVector
)X[v1] == X[v1, :]
, returns aStateSpaceSet
with the points in those indices.X[:, j]
gives thej
th variable timeseries (or collection), asVector
X[v1, v2], X[:, v2]
returns aStateSpaceSet
with the appropriate entries (first indices being "time"/point index, while second being variables)X[i, j]
value of thej
th variable, at thei
th timepoint
Use Matrix(ssset)
or StateSpaceSet(matrix)
to convert. It is assumed that each column of the matrix
is one variable. If you have various timeseries vectors x, y, z, ...
pass them like StateSpaceSet(x, y, z, ...)
. You can use columns(dataset)
to obtain the reverse, i.e. all columns of the dataset in a tuple.
StateSpaceSets.standardize
— Functionstandardize(d::StateSpaceSet) → r
Create a standardized version of the input set where each column is transformed to have mean 0 and standard deviation 1.
standardize(x::AbstractVector{<:Real}) = (x - mean(x))/std(x)
In essence a StateSpaceSet
is simply a wrapper for a Vector
of SVector
s. However, it is visually represented as a matrix, similarly to how numerical data would be printed on a spreadsheet (with time being the column direction). It also offers a lot more functionality than just pretty-printing. Besides the examples in the documentation string, you can e.g. iterate over data points
using DynamicalSystems
hen = Systems.henon()
data = trajectory(hen, 10000) # this returns a dataset
for point in data
# stuff
end
Most functions from DynamicalSystems.jl that manipulate and use multidimensional data are expecting a StateSpaceSet
. This allows us to define efficient methods that coordinate well with each other, like e.g. embed
.
StateSpaceSet Functions
StateSpaceSets.minima
— Functionminima(dataset)
Return an SVector
that contains the minimum elements of each timeseries of the dataset.
StateSpaceSets.maxima
— Functionmaxima(dataset)
Return an SVector
that contains the maximum elements of each timeseries of the dataset.
StateSpaceSets.minmaxima
— Functionminmaxima(dataset)
Return minima(dataset), maxima(dataset)
without doing the computation twice.
StateSpaceSets.columns
— Functioncolumns(ssset) -> x, y, z, ...
Return the individual columns of the state space set allocated as Vector
s. Equivalent with collect(eachcol(ssset))
.
StateSpaceSet distances
Two datasets
StateSpaceSets.set_distance
— Functionset_distance(ssset1, ssset2 [, distance])
Calculate a distance between two StateSpaceSet
s, i.e., a distance defined between sets of points, as dictated by distance
.
Possible distance
types are:
Centroid
, which is the default, and 100s of times faster than the restHausdorff
StrictlyMinimumDistance
StateSpaceSets.Hausdorff
— TypeHausdorff(metric = Euclidean())
A distance that can be used in set_distance
. The Hausdorff distance is the greatest of all the distances from a point in one set to the closest point in the other set. The distance is calculated with the metric given to Hausdorff
which defaults to Euclidean.
Hausdorff
is 2x slower than StrictlyMinimumDistance
, however it is a proper metric in the space of sets of state space sets.
StateSpaceSets.Centroid
— TypeCentroid(metric = Euclidean())
A distance that can be used in set_distance
. The Centroid
method returns the distance (according to metric
) between the centroids (a.k.a. centers of mass) of the sets.
metric
can be any function that takes in two static vectors are returns a positive definite number to use as a distance (and typically is a Metric
from Distances.jl).
Sets of datasets
StateSpaceSets.setsofsets_distances
— Functionsetsofsets_distances(a₊, a₋ [, distance]) → distances
Calculate distances between sets of StateSpaceSet
s. Here a₊, a₋
are containers of StateSpaceSet
s, and the returned distances are dictionaries of distances. Specifically, distances[i][j]
is the distance of the set in the i
key of a₊
to the j
key of a₋
. Notice that distances from a₋
to a₊
are not computed at all (assumming symmetry in the distance function).
The distance
can be as in set_distance
, or it can be an arbitrary function that takes as input two state space sets and returns any positive-definite number as their "distance".
StateSpaceSet I/O
Input/output functionality for an AbstractStateSpaceSet
is already achieved using base Julia, specifically writedlm
and readdlm
. To write and read a dataset, simply do:
using DelimitedFiles
data = StateSpaceSet(rand(1000, 2))
# I will write and read using delimiter ','
writedlm("data.txt", data, ',')
# Don't forget to convert the matrix to a StateSpaceSet when reading
data = StateSpaceSet(readdlm("data.txt", ',', Float64))
Neighborhoods
Neighborhoods refer to the common act of finding points in a dataset that are nearby a given point (which typically belongs in the dataset). DynamicalSystems.jl bases this interface on Neighborhood.jl. You can go to its documentation if you are interested in finding neighbors in a dataset for e.g. a custom algorithm implementation.
For DynamicalSystems.jl, what is relevant are the two types of neighborhoods that exist:
Neighborhood.NeighborNumber
— TypeNeighborNumber(k::Int) <: SearchType
Search type representing the k
nearest neighbors of the query (or approximate neighbors, depending on the search structure).
Neighborhood.WithinRange
— TypeWithinRange(r::Real) <: SearchType
Search type representing all neighbors with distance ≤ r
from the query (according to the search structure's metric).
Theiler window
The Theiler window is a concept that is useful when finding neighbors in a dataset that is coming from the sampling of a continuous dynamical system. As demonstrated in the figure below, it tries to eliminate spurious "correlations" (wrongly counted neighbors) due to a potentially dense sampling of the trajectory (e.g. by giving small sampling time in trajectory
).
The figure below demonstrates a typical WithinRange
search around the black point with index i
. Black, red and green points are found neighbors, but points within indices j
that satisfy |i-j| ≤ w
should not be counted as "true" neighbors. These neighbors are typically the same around any state space point, and thus wrongly bias calculations by providing a non-zero baseline of neighbors. For the sketch below, w=3
would have been used.
Typically a good choice for w
coincides with the choice an optimal delay time, see estimate_delay
, for any of the timeseries of the dataset.
Samplers
StateSpaceSets.statespace_sampler
— Functionstatespace_sampler(region [, seed = 42]) → sampler, isinside
A function that facilitates sampling points randomly and uniformly in a state space region
. It generates two functions:
sampler
is a 0-argument function that when called generates a random point inside a state spaceregion
. The point is always aVector
for type stability irrespectively of dimension. Generally, the generated point should be copied if it needs to be stored. (i.e., callingsampler()
utilizes a shared vector)sampler
is a thread-safe function.isinside
is a 1-argument function that returnstrue
if the given state space point is inside theregion
.
The region
can be an instance of any of the following types (input arguments if not specified are vectors of length D
, with D
the state space dimension):
HSphere(radius::Real, center)
: points inside the hypersphere (boundary excluded). Convenience methodHSphere(radius::Real, D::Int)
makes the center aD
-long vector of zeros.HSphereSurface(radius, center)
: points on the hypersphere surface. Same convenience method as above is possible.HRectangle(mins, maxs)
: points in [min, max) for the bounds along each dimension.
The random number generator is always Xoshiro
with the given seed
.
statespace_sampler(grid::NTuple{N, AbstractRange} [, seed])
If given a grid
that is a tuple of AbstractVector
s, the minimum and maximum of the vectors are used to make an HRectangle
region.
StateSpaceSets.HSphere
— TypeHSphere(r::Real, center::Vector)
HSphere(r::Real, D::Int)
A state space region denoting all points within a hypersphere.
StateSpaceSets.HSphereSurface
— TypeHSphereSurface(r::Real, center::Vector)
HSphereSurface(r::Real, D::Int)
A state space region denoting all points on the surface (boundary) of a hypersphere.
StateSpaceSets.HRectangle
— TypeHRectangle(mins::Vector, maxs::Vector)
A state space region denoting all points within the hyperrectangle.