# Numerical Data

The word "timeseries" can be confusing, because it can mean a univariate (also called scalar or one-dimensional) timeseries or a multivariate (also called multi-dimensional) timeseries. To resolve this confusion, in **DynamicalSystems.jl** we have the following convention: **"timeseries"** is always univariate! it refers to a one-dimensional vector of numbers, which exists with respect to some other one-dimensional vector of numbers that corresponds to a time vector. On the other hand, we use the word **"dataset"** is used to refer to a *multi-dimensional* timeseries, which is of course simply a group/set of one-dimensional timeseries represented as a `Dataset`

. In some documentation strings we use the word "trajectory" instead of "dataset", which means an ordered multivariate timeseries. This is typically the output of the function `trajectory`

, or the delay embedding of a timeseries via `embed`

, both of which are also represented as a `Dataset`

.

## Datasets

Trajectories, and in general sets in state space, are represented by a structure called `Dataset`

in **DynamicalSystems.jl** (while timeseries are always standard Julia `Vector`

s). It is recommended to always `standardize`

datasets.

`DelayEmbeddings.Dataset`

— Type`Dataset{D, T} <: AbstractDataset{D,T}`

A dedicated interface for datasets. It contains *equally-sized datapoints* of length `D`

, represented by `SVector{D, T}`

. These data are a standard Julia `Vector{SVector}`

, and can be obtained with `vec(dataset)`

.

When indexed with 1 index, a `dataset`

is like a vector of datapoints. When indexed with 2 indices it behaves like a matrix that has each of the columns be the timeseries of each of the variables.

`Dataset`

also supports most sensible operations like `append!, push!, hcat, eachrow`

, among others, and when iterated over, it iterates over its contained points.

**Description of indexing**

In the following let `i, j`

be integers, `typeof(data) <: AbstractDataset`

and `v1, v2`

be `<: AbstractVector{Int}`

(`v1, v2`

could also be ranges, and for massive performance benefits make `v2`

an `SVector{X, Int}`

).

`data[i] == data[i, :]`

gives the`i`

th datapoint (returns an`SVector`

)`data[v1] == data[v1, :]`

, returns a`Dataset`

with the points in those indices.`data[:, j]`

gives the`j`

th variable timeseries, as`Vector`

`data[v1, v2], data[:, v2]`

returns a`Dataset`

with the appropriate entries (first indices being "time"/point index, while second being variables)`data[i, j]`

value of the`j`

th variable, at the`i`

th timepoint

Use `Matrix(dataset)`

or `Dataset(matrix)`

to convert. It is assumed that each *column* of the `matrix`

is one variable. If you have various timeseries vectors `x, y, z, ...`

pass them like `Dataset(x, y, z, ...)`

. You can use `columns(dataset)`

to obtain the reverse, i.e. all columns of the dataset in a tuple.

`DelayEmbeddings.standardize`

— Function`standardize(d::Dataset) → r`

Create a standardized version of the input dataset where each timeseries (column) is transformed to have mean 0 and standard deviation 1.

`standardize(x::Vector) = (x - mean(x))/std(x)`

In essence a `Dataset`

is simply a wrapper for a `Vector`

of `SVector`

s. However, it is visually represented as a matrix, similarly to how numerical data would be printed on a spreadsheet (with time being the *column* direction). It also offers a lot more functionality than just pretty-printing. Besides the examples in the documentation string, you can e.g. iterate over data points

```
using DynamicalSystems
hen = Systems.henon()
data = trajectory(hen, 10000) # this returns a dataset
for point in data
# stuff
end
```

Most functions from **DynamicalSystems.jl** that manipulate and use multidimensional data are expecting a `Dataset`

. This allows us to define efficient methods that coordinate well with each other, like e.g. `embed`

.

## Dataset Functions

`DelayEmbeddings.minima`

— Function`minima(dataset)`

Return an `SVector`

that contains the minimum elements of each timeseries of the dataset.

`DelayEmbeddings.maxima`

— Function`maxima(dataset)`

Return an `SVector`

that contains the maximum elements of each timeseries of the dataset.

`DelayEmbeddings.minmaxima`

— Function`minmaxima(dataset)`

Return `minima(dataset), maxima(dataset)`

without doing the computation twice.

`DelayEmbeddings.columns`

— Function`columns(dataset) -> x, y, z, ...`

Return the individual columns of the dataset.

## Dataset I/O

Input/output functionality for an `AbstractDataset`

is already achieved using base Julia, specifically `writedlm`

and `readdlm`

. To write and read a dataset, simply do:

```
using DelimitedFiles
data = Dataset(rand(1000, 2))
# I will write and read using delimiter ','
writedlm("data.txt", data, ',')
# Don't forget to convert the matrix to a Dataset when reading
data = Dataset(readdlm("data.txt", ',', Float64))
```

## Neighborhoods

Neighborhoods refer to the common act of finding points in a dataset that are nearby a given point (which typically belongs in the dataset). **DynamicalSystems.jl** bases this interface on Neighborhood.jl. You can go to its documentation if you are interested in finding neighbors in a dataset for e.g. a custom algorithm implementation.

For **DynamicalSystems.jl**, what is relevant are the two types of neighborhoods that exist:

`Neighborhood.NeighborNumber`

— Type`NeighborNumber(k::Int) <: SearchType`

Search type representing the `k`

nearest neighbors of the query (or approximate neighbors, depending on the search structure).

`Neighborhood.WithinRange`

— Type`WithinRange(r::Real) <: SearchType`

Search type representing all neighbors with distance `≤ r`

from the query (according to the search structure's metric).

## Theiler window

The Theiler window is a concept that is useful when finding neighbors in a dataset that is coming from the sampling of a continuous dynamical system. As demonstrated in the figure below, it tries to eliminate spurious "correlations" (wrongly counted neighbors) due to a potentially dense sampling of the trajectory (e.g. by giving small sampling time in `trajectory`

).

The figure below demonstrates a typical `WithinRange`

search around the black point with index `i`

. Black, red and green points are found neighbors, but points within indices `j`

that satisfy `|i-j| ≤ w`

should *not* be counted as "true" neighbors. These neighbors are typically the same around *any* state space point, and thus wrongly bias calculations by providing a non-zero baseline of neighbors. For the sketch below, `w=3`

would have been used.

Typically a good choice for `w`

coincides with the choice an optimal delay time, see `estimate_delay`

, for any of the timeseries of the dataset.