# S-measure

`CausalityTools.SMeasure.s_measure`

— Function```
s_measure(x::AbstractDataset, y::AbstractDataset;
K::Int = 2, metric = SqEuclidean(), tree_metric = Euclidean()) → Float64
s_measure(x::AbstractVector, y::AbstractVector;
K::Int = 2, metric = SqEuclidean(), tree_metric = Euclidean(),
dx::Int = 2, my::Int = 2, τx::Int = 1, τy::Int = 1) → Float64
s_measure(x::AbstractDataset, y::AbstractVector;
K::Int = 2, metric = SqEuclidean(), tree_metric = Euclidean(),
dx::Int = 2, τx::Int = 1) → Float64
s_measure(x::AbstractDataset, y::AbstractVector;
K::Int = 2, metric = SqEuclidean(), tree_metric = Euclidean(),
dy::Int = 2, τy::Int = 1) → Float64
```

S-measure ^{[Grassberger1999]}^{[Quiroga2000]} for the directional dependence between univariate/multivariate time series `x`

and `y`

.

Returns a scalar `s ∈ [0, 1]`

, where `s = 0`

indicates independence between `x`

and `y`

, and higher values indicate synchronization between `x`

and `y`

, with complete synchronization for `s = 1.0`

.

**Input data**

The algorithm is slightly modified from [1] to allow univariate time series as input. In that case, these time series are embedded using the parameters `mx`

/`τx`

(dimension/lag for time series `x`

) or `my`

/`τy`

(dimension/lag for time series `y`

). For custom embeddings, do the embedding yourself and input `Dataset`

s as both `x`

and `y`

.

- If
`x`

and`y`

are`dx`

and`dy`

-dimensional`Dataset`

s, respectively, then use`x`

and`y`

as is. - If
`x`

and`y`

are scalar time series, then create`dx`

and`dy`

dimensional embeddings, respectively, of both`x`

and`y`

, resulting in`N`

different`m`

-dimensional embedding points $X = \{x_1, x_2, \ldots, x_N \}$ and $Y = \{y_1, y_2, \ldots, y_N \}$.`τx`

and`τy`

control the embedding lags for`x`

and`y`

. - If
`x`

is a scalar-valued vector and`y`

is a`Dataset`

, or vice versa, then create an embedding of the scalar time series using parameters`dx`

/`τx`

or`dy`

/`τy`

.

In all three cases, input datasets are length-matched by eliminating points at the end of the longest dataset (after the embedding step, if relevant) before analysis.

**Description**

Let $r_{i,j}$ and $s_{i,j}$ be the indices of the

`K`

-th nearest neighbors of $x_i$ and $y_i$, respectively.Compute the the mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, using the indices $r_{i, j}$.

\[R_i^{(k)}(x) = \dfrac{1}{k} \sum_{i=1}^{k}(x_i, x_{r_{i,j}})^2\]

- Compute the y-conditioned mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, now using the indices $s_{i,j}$.

\[R_i^{(k)}(x|y) = \dfrac{1}{k} \sum_{i=1}^{k}(x_i, x_{s_{i,j}})^2\]

- Define the following measure of independence, where $0 \leq S \leq 1$, and low values indicate independence and values close to one occur for synchronized signals.

\[S^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \dfrac{R_i^{(k)}(x)}{R_i^{(k)}(x|y)}\]

**Example**

```
using CausalityTools
# A two-dimensional Ulam lattice map
sys = ulam(2)
# Sample 1000 points after discarding 5000 transients
orbit = trajectory(sys, 1000, Ttr = 5000)
x, y = orbit[:, 1], orbit[:, 2]
# 4-dimensional embedding for `x`, 5-dimensional embedding for `y`
s_measure(x, y, dx = 4, τx = 3, dy = 5, τy = 1)
```

## Example: random data vs Henon maps

Here, we'll compute the S-measure between random time series (uniform noise), between time series of a dynamical system (coupled Henon maps).

We start by generating the time series.

```
using CausalityTools, DynamicalSystems, Plots, Distributions
npts, Ttr = 10000, 5000
x, y = columns(trajectory(henon2(c_xy = 0.87), npts - 1, Ttr = Ttr))
xr, yr = rand(Uniform(-1, 1), npts), rand(Uniform(-1, 1), npts)
[x y xr yr]
```

```
10000×4 Matrix{Float64}:
-1.74964 1.77648 0.103752 0.0316809
-1.12828 -1.74964 -0.51945 0.906668
-0.397911 -1.12828 -0.461962 0.160849
0.903182 -0.397911 -0.0463781 0.935012
0.464889 0.903182 -0.078328 -0.798098
1.45483 0.464889 0.711305 -0.325035
-0.577073 1.45483 0.756307 0.343001
1.50344 -0.577073 -0.472869 0.224606
-1.03344 1.50344 -0.097294 0.604994
0.783024 -1.03344 -0.505048 -0.912011
⋮
0.507192 -1.17617 0.730132 0.387063
0.789907 0.507192 0.200391 -0.439615
0.928205 0.789907 0.0125071 0.128228
0.775407 0.928205 0.0973673 -0.140331
1.07721 0.775407 0.653623 -0.355214
0.472251 1.07721 -0.845574 -0.87317
1.50014 0.472251 -0.66321 -0.00962741
-0.708748 1.50014 0.628075 0.0627294
1.34772 -0.708748 -0.866241 0.201507
```

Let's plot the time series.

```
p_det = plot(xlabel = "", ylabel = "Value", title = "Coupled Henon maps")
plot!(x[1:100], label = "x", marker = stroke(:black), c = :black)
plot!(y[1:100], label = "y", marker = stroke(:red), c = :red)
p_rand = plot(xlabel = "Time step", ylabel = "Value", title = "Random time series")
plot!(xr[1:100], label = "xr", c = :blue)
plot!(yr[1:100], label = "yr", c = :purple)
plot(p_det, p_rand, layout = grid(2, 1), size = (382*2, 400), legend = :bottomright,
tickfont = font(13), guidefont = font(13), legendfont = font(13))
```

Now we compute the S-measure between the random time series, both from `x`

to `y`

and from `y`

to `x`

. We'll also do the same for the Henon maps.

The test parameters are embedding dimensions (`dx`

for the source and `dy`

for the target), the embedding lags (`τx`

for the source and `τy`

for the target), and the number of nearest neighbors `K`

. We'll compute the test with fixed embedding parameters, but a varying number of nearest neighbors (`ks = 2:10`

).

For the sake of demonstration, we'll use the same embedding parameters both for the source and for the target, for all analyses. For a real use case, embedding parameters should be chosen more carefully, and will, in general, be different for the source and for the target.

```
dx, dy = 5, 5
τx, τy = 1, 1
# Compute the s-measures for different values of k
ks = 2:10
ss_r_xy = [s_measure(xr, yr, dx = dx, τx = τx, dy = dy, τy = τy, K = k) for k in ks]
ss_r_yx = [s_measure(yr, xr, dx = dx, τx = τx, dy = dy, τy = τy, K = k) for k in ks]
ss_henon_xy = [s_measure(x, y, dx = dx, τx = τx, dy = dy, τy = τy, K = k) for k in ks]
ss_henon_yx = [s_measure(y, x, dx = dx, τx = τx, dy = dy, τy = τy, K = k) for k in ks]
plot(xlabel = "# nearest neighbors (k)", ylabel = "S", ylims = (-0.05, 1.05))
plot!(ks, ss_r_xy, label = "random uncoupled system (x -> y)", marker = stroke(2), c = :black)
plot!(ks, ss_r_yx, label = "random uncoupled system (y -> x)", marker = stroke(2), c = :red)
plot!(ks, ss_henon_xy, marker = stroke(2), label = "henon unidir (x -> y)")
plot!(ks, ss_henon_yx, marker = stroke(2), label = "henon unidir (y -> x)")
```

For uncoupled time series, we expect the value of $S$ to be close to zero. For strongly coupled time series, the value of $S$ should be nonzero and approaching 1. This is exactly what we get: for time random time series, the value of $S$ is close to zero and for the Henon maps, it's clearly non-zero.

Note that the actual dynamical coupling in the Henon system is unidirectional from `x`

to `y`

. The results (positive $S$ in both directions), however, indicate that the coupling is bidirectional, with the coupling being stronger in one direction. This disagreement between results and ground truth highlights the importance of employing *causal hypothesis testing*. For this, we could use TimeseriesSurrogates.jl to generate surrogate time series. An in-depth example on how to use surrogate testing can be found in the transfer entropy example.

- Quiroga2000Quian Quiroga, R., Arnhold, J. & Grassberger, P. [2000] “Learning driver-response relationships from synchronization patterns,” Phys. Rev. E61(5), 5142–5148.
- Grassberger1999Arnhold, J., Grassberger, P., Lehnertz, K., & Elger, C. E. (1999). A robust method for detecting interdependences: application to intracranially recorded EEG. Physica D: Nonlinear Phenomena, 134(4), 419-430.