# Numerical Data

`StateSpaceSets`

— Module**StateSpaceSets.jl**

A Julia package that provides functionality for state space sets. These are ordered collections of points of fixed length (called dimension). It is used by many other packages in the JuliaDynamics organization. The main export of `StateSpaceSets`

is the concrete type `StateSpaceSet`

. The package also provides functionality for distances, neighbor searches, sampling, and normalization.

To install it you may run `import Pkg; Pkg.add("StateSpaceSets")`

, however, there is no real reason to install this package directly as it is re-exported by all downstream packages that use it.

The word "timeseries" can be confusing, because it can mean a univariate (also called scalar or one-dimensional) timeseries or a multivariate (also called multi-dimensional) timeseries. To resolve this confusion, in **DynamicalSystems.jl** we have the following convention: **"timeseries"** is always univariate! it refers to a one-dimensional vector of numbers, which exists with respect to some other one-dimensional vector of numbers that corresponds to a time vector. On the other hand, we use the word **"state space set"** to refer to a *multi-dimensional* timeseries, which is of course simply a group/set of one-dimensional timeseries represented as a `StateSpaceSet`

.

## StateSpaceSet

Trajectories, and in general sets in state space, are represented by a structure called `StateSpaceSet`

in **DynamicalSystems.jl** (while timeseries are always standard Julia `Vector`

s). It is recommended to always `standardize`

datasets.

`StateSpaceSets.StateSpaceSet`

— Type`StateSpaceSet{D, T, V} <: AbstractVector{V}`

A dedicated interface for sets in a state space. It is an **ordered container of equally-sized points** of length `D`

, with element type `T`

, represented by a vector of type `V`

. Typically `V`

is `SVector{D,T}`

or `Vector{T}`

and the data are always stored internally as `Vector{V}`

. `SSSet`

is an alias for `StateSpaceSet`

.

The underlying `Vector{V}`

can be obtained by `vec(ssset)`

, although this is almost never necessary because `StateSpaceSet`

subtypes `AbstractVector`

and extends its interface. `StateSpaceSet`

also supports almost all sensible vector operations like `append!, push!, hcat, eachrow`

, among others. When iterated over, it iterates over its contained points.

**Construction**

Constructing a `StateSpaceSet`

is done in three ways:

- By giving in each individual
**columns**of the state space set as`Vector{<:Real}`

:`StateSpaceSet(x, y, z, ...)`

. - By giving in a matrix whose rows are the state space points:
`StateSpaceSet(m)`

. - By giving in directly a vector of vectors (state space points):
`StateSpaceSet(v_of_v)`

.

All constructors allow for the keyword `container`

which sets the type of `V`

(the type of inner vectors). At the moment options are only `SVector`

, `MVector`

, or `Vector`

, and by default `SVector`

is used.

**Description of indexing**

When indexed with 1 index, `StateSpaceSet`

behaves exactly like its encapsulated vector. i.e., a vector of vectors (state space points). When indexed with 2 indices it behaves like a matrix where each row is a point.

In the following let `i, j`

be integers, `typeof(X) <: AbstractStateSpaceSet`

and `v1, v2`

be `<: AbstractVector{Int}`

(`v1, v2`

could also be ranges, and for performance benefits make `v2`

an `SVector{Int}`

).

`X[i] == X[i, :]`

gives the`i`

th point (returns an`SVector`

)`X[v1] == X[v1, :]`

, returns a`StateSpaceSet`

with the points in those indices.`X[:, j]`

gives the`j`

th variable timeseries (or collection), as`Vector`

`X[v1, v2], X[:, v2]`

returns a`StateSpaceSet`

with the appropriate entries (first indices being "time"/point index, while second being variables)`X[i, j]`

value of the`j`

th variable, at the`i`

th timepoint

Use `Matrix(ssset)`

or `StateSpaceSet(matrix)`

to convert. It is assumed that each *column* of the `matrix`

is one variable. If you have various timeseries vectors `x, y, z, ...`

pass them like `StateSpaceSet(x, y, z, ...)`

. You can use `columns(dataset)`

to obtain the reverse, i.e. all columns of the dataset in a tuple.

In essence a `StateSpaceSet`

is simply a wrapper for a `Vector`

of `SVector`

s. However, it is visually represented as a matrix, similarly to how numerical data would be printed on a spreadsheet (with time being the *column* direction). It also offers a lot more functionality than just pretty-printing. Besides the examples in the documentation string, you can e.g. iterate over data points

```
using DynamicalSystems
hen = Systems.henon()
data = trajectory(hen, 10000) # this returns a dataset
for point in data
# stuff
end
```

Most functions from **DynamicalSystems.jl** that manipulate ors use multidimensional data are expecting a `StateSpaceSet`

.

`StateSpaceSet`

accesses

`StateSpaceSets.minima`

— Function`minima(dataset)`

Return an `SVector`

that contains the minimum elements of each timeseries of the dataset.

`StateSpaceSets.maxima`

— Function`maxima(dataset)`

Return an `SVector`

that contains the maximum elements of each timeseries of the dataset.

`StateSpaceSets.minmaxima`

— Function`minmaxima(dataset)`

Return `minima(dataset), maxima(dataset)`

without doing the computation twice.

`StateSpaceSets.columns`

— Function`columns(ssset) -> x, y, z, ...`

Return the individual columns of the state space set allocated as `Vector`

s. Equivalent with `collect(eachcol(ssset))`

.

## Basic statistics

`StateSpaceSets.standardize`

— Function`standardize(d::StateSpaceSet) → r`

Create a standardized version of the input set where each column is transformed to have mean 0 and standard deviation 1.

`standardize(x::AbstractVector{<:Real}) = (x - mean(x))/std(x)`

`Statistics.cor`

— Function`cor(d::StateSpaceSet) → m::SMatrix`

Compute the corrlation matrix `m`

from the columns of `d`

, where `m[i, j]`

is the correlation between `d[:, i]`

and `d[:, j]`

.

`Statistics.cov`

— Function`cov(d::StateSpaceSet) → m::SMatrix`

Compute the covariance matrix `m`

from the columns of `d`

, where `m[i, j]`

is the covariance between `d[:, i]`

and `d[:, j]`

.

`StateSpaceSets.mean_and_cov`

— Function`mean_and_cov(d::StateSpaceSet) → μ, m::SMatrix`

Return a tuple of the column means `μ`

and covariance matrix `m`

.

Column means are always computed for the covariance matrix, so this is faster than computing both quantities separately.

`StateSpaceSet`

distances

### Two datasets

`StateSpaceSets.set_distance`

— Function`set_distance(ssset1, ssset2 [, distance])`

Calculate a distance between two `StateSpaceSet`

s, i.e., a distance defined between sets of points, as dictated by `distance`

.

Possible `distance`

types are:

`Centroid`

, which is the default, and 100s of times faster than the rest`Hausdorff`

`StrictlyMinimumDistance`

- Any function
`f(A, B)`

that returns the distance between two state space sets`A, B`

.

`StateSpaceSets.Hausdorff`

— Type`Hausdorff(metric = Euclidean())`

A distance that can be used in `set_distance`

. The Hausdorff distance is the greatest of all the distances from a point in one set to the closest point in the other set. The distance is calculated with the metric given to `Hausdorff`

which defaults to Euclidean.

`Hausdorff`

is 2x slower than `StrictlyMinimumDistance`

, however it is a proper metric in the space of sets of state space sets.

This metric only works for `StateSpaceSet`

s whose elements are `SVector`

s.

For developers: `set_distance`

can take keywords `tree1, tree2`

that are the KDTrees of the first and second sets respectively.

`StateSpaceSets.Centroid`

— Type`Centroid(metric = Euclidean())`

A distance that can be used in `set_distance`

. The `Centroid`

method returns the distance (according to `metric`

) between the centroids (a.k.a. centers of mass) of the sets.

`metric`

can be any function that takes in two static vectors are returns a positive definite number to use as a distance (and typically is a `Metric`

from Distances.jl).

`StateSpaceSets.StrictlyMinimumDistance`

— Type`StrictlyMinimumDistance([brute = false,] [metric = Euclidean(),])`

A distance that can be used in `set_distance`

. The `StrictlyMinimumDistance`

returns the minimum distance of all the distances from a point in one set to the closest point in the other set. The distance is calculated with the given metric.

The `brute::Bool`

argument switches the computation between a KDTree-based version, or brute force (i.e., calculation of all distances and picking the smallest one). Brute force performs better for sets that are either large dimensional or have a small amount of points. Deciding a cutting point is not trivial, and is recommended to simply benchmark the `set_distance`

function to make a decision.

If `brute = false`

this metric only works for `StateSpaceSet`

s whose elements are `SVector`

s.

For developers: `set_distance`

can take a keyword `tree2`

that is the KDTree of the second set.

### Sets of datasets

`StateSpaceSets.setsofsets_distances`

— Function`setsofsets_distances(a₊, a₋ [, distance]) → distances`

Calculate distances between sets of `StateSpaceSet`

s. Here `a₊, a₋`

are containers of `StateSpaceSet`

s, and the returned distances are dictionaries of distances. Specifically, `distances[i][j]`

is the distance of the set in the `i`

key of `a₊`

to the `j`

key of `a₋`

. Distances from `a₋`

to `a₊`

are not computed at all, assumming symmetry in the distance function.

The `distance`

can be anything valid for `set_distance`

.

## StateSpaceSet I/O

Input/output functionality for an `AbstractStateSpaceSet`

is already achieved using base Julia, specifically `writedlm`

and `readdlm`

. To write and read a dataset, simply do:

```
using DelimitedFiles
data = StateSpaceSet(rand(1000, 2))
# I will write and read using delimiter ','
writedlm("data.txt", data, ',')
# Don't forget to convert the matrix to a StateSpaceSet when reading
data = StateSpaceSet(readdlm("data.txt", ',', Float64))
```

## Neighborhoods

Neighborhoods refer to the common act of finding points in a dataset that are nearby a given point (which typically belongs in the dataset). **DynamicalSystems.jl** bases this interface on Neighborhood.jl. You can go to its documentation if you are interested in finding neighbors in a dataset for e.g. a custom algorithm implementation.

For **DynamicalSystems.jl**, what is relevant are the two types of neighborhoods that exist:

`Neighborhood.NeighborNumber`

— Type`NeighborNumber(k::Int) <: SearchType`

Search type representing the `k`

nearest neighbors of the query (or approximate neighbors, depending on the search structure).

`Neighborhood.WithinRange`

— Type`WithinRange(r::Real) <: SearchType`

Search type representing all neighbors with distance `≤ r`

from the query (according to the search structure's metric).

## Samplers

`StateSpaceSets.statespace_sampler`

— Function`statespace_sampler(region [, seed = 42]) → sampler, isinside`

A function that facilitates sampling points randomly and uniformly in a state space `region`

. It generates two functions:

`sampler`

is a 0-argument function that when called generates a random point inside a state space`region`

. The point is always a`Vector`

for type stability irrespectively of dimension. Generally, the generated point should be*copied*if it needs to be stored. (i.e., calling`sampler()`

utilizes a shared vector)`sampler`

is a thread-safe function.`isinside`

is a 1-argument function that returns`true`

if the given state space point is inside the`region`

.

The `region`

can be an instance of any of the following types (input arguments if not specified are vectors of length `D`

, with `D`

the state space dimension):

`HSphere(radius::Real, center)`

: points*inside*the hypersphere (boundary excluded). Convenience method`HSphere(radius::Real, D::Int)`

makes the center a`D`

-long vector of zeros.`HSphereSurface(radius, center)`

: points on the hypersphere surface. Same convenience method as above is possible.`HRectangle(mins, maxs)`

: points in [min, max) for the bounds along each dimension.

The random number generator is always `Xoshiro`

with the given `seed`

.

`statespace_sampler(grid::NTuple{N, AbstractRange} [, seed])`

If given a `grid`

that is a tuple of `AbstractVector`

s, the minimum and maximum of the vectors are used to make an `HRectangle`

region.

`StateSpaceSets.HSphere`

— Type```
HSphere(r::Real, center::AbstractVector)
HSphere(r::Real, D::Int)
```

A state space region denoting all points *within* a hypersphere.

`StateSpaceSets.HSphereSurface`

— Type```
HSphereSurface(r::Real, center::AbstractVector)
HSphereSurface(r::Real, D::Int)
```

A state space region denoting all points *on the surface* (boundary) of a hypersphere.

`StateSpaceSets.HRectangle`

— Type`HRectangle(mins::AbstractVector, maxs::AbstractVector)`

A state space region denoting all points *within* the hyperrectangle.