S-measure

CausalityTools.SMeasure.s_measureFunction
s_measure(x::AbstractDataset, y::AbstractDataset; 
    K::Int = 2, metric = SqEuclidean(), tree_metric = Euclidean()) → Float64
s_measure(x::AbstractVector, y::AbstractVector; 
    K::Int = 2, metric = SqEuclidean(), tree_metric = Euclidean(),
    dx::Int = 2, my::Int = 2, τx::Int = 1, τy::Int = 1) → Float64
s_measure(x::AbstractDataset, y::AbstractVector; 
    K::Int = 2, metric = SqEuclidean(), tree_metric = Euclidean(),
    dx::Int = 2, τx::Int = 1) → Float64
s_measure(x::AbstractDataset, y::AbstractVector; 
    K::Int = 2, metric = SqEuclidean(), tree_metric = Euclidean(),
    dy::Int = 2, τy::Int = 1) → Float64

S-measure [Grassberger1999][Quiroga2000] for the directional dependence between univariate/multivariate time series x and y.

Returns a scalar s ∈ [0, 1], where s = 0 indicates independence between x and y, and higher values indicate synchronization between x and y, with complete synchronization for s = 1.0.

Input data

The algorithm is slightly modified from [1] to allow univariate time series as input. In that case, these time series are embedded using the parameters mx/τx (dimension/lag for time series x) or my/τy (dimension/lag for time series y). For custom embeddings, do the embedding yourself and input Datasets as both x and y.

  • If x and y are dx and dy-dimensional Datasets, respectively, then use x and y as is.
  • If x and y are scalar time series, then create dx and dy dimensional embeddings, respectively, of both x and y, resulting in N different m-dimensional embedding points $X = \{x_1, x_2, \ldots, x_N \}$ and $Y = \{y_1, y_2, \ldots, y_N \}$. τx and τy control the embedding lags for x and y.
  • If x is a scalar-valued vector and y is a Dataset, or vice versa, then create an embedding of the scalar time series using parameters dx/τx or dy/τy.

In all three cases, input datasets are length-matched by eliminating points at the end of the longest dataset (after the embedding step, if relevant) before analysis.

Description

  1. Let $r_{i,j}$ and $s_{i,j}$ be the indices of the K-th nearest neighbors of $x_i$ and $y_i$, respectively.

  2. Compute the the mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, using the indices $r_{i, j}$.

\[R_i^{(k)}(x) = \dfrac{1}{k} \sum_{i=1}^{k}(x_i, x_{r_{i,j}})^2\]

  • Compute the y-conditioned mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, now using the indices $s_{i,j}$.

\[R_i^{(k)}(x|y) = \dfrac{1}{k} \sum_{i=1}^{k}(x_i, x_{s_{i,j}})^2\]

  • Define the following measure of independence, where $0 \leq S \leq 1$, and low values indicate independence and values close to one occur for synchronized signals.

\[S^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \dfrac{R_i^{(k)}(x)}{R_i^{(k)}(x|y)}\]

Example

using CausalityTools

# A two-dimensional Ulam lattice map
sys = ulam(2)

# Sample 1000 points after discarding 5000 transients
orbit = trajectory(sys, 1000, Ttr = 5000)
x, y = orbit[:, 1], orbit[:, 2]

# 4-dimensional embedding for `x`, 5-dimensional embedding for `y`
s_measure(x, y, dx = 4, τx = 3, dy = 5, τy = 1)
source

Example: random data vs Henon maps

Here, we'll compute the S-measure between random time series (uniform noise), between time series of a dynamical system (coupled Henon maps).

We start by generating the time series.

using CausalityTools, DynamicalSystems, Plots, Distributions
npts, Ttr = 10000, 5000
x, y = columns(trajectory(henon2(c_xy = 0.87), npts - 1, Ttr = Ttr))
xr, yr = rand(Uniform(-1, 1), npts), rand(Uniform(-1, 1), npts)
[x y xr yr]
10000×4 Matrix{Float64}:
 -1.74964    1.77648    0.103752    0.0316809
 -1.12828   -1.74964   -0.51945     0.906668
 -0.397911  -1.12828   -0.461962    0.160849
  0.903182  -0.397911  -0.0463781   0.935012
  0.464889   0.903182  -0.078328   -0.798098
  1.45483    0.464889   0.711305   -0.325035
 -0.577073   1.45483    0.756307    0.343001
  1.50344   -0.577073  -0.472869    0.224606
 -1.03344    1.50344   -0.097294    0.604994
  0.783024  -1.03344   -0.505048   -0.912011
  ⋮                                
  0.507192  -1.17617    0.730132    0.387063
  0.789907   0.507192   0.200391   -0.439615
  0.928205   0.789907   0.0125071   0.128228
  0.775407   0.928205   0.0973673  -0.140331
  1.07721    0.775407   0.653623   -0.355214
  0.472251   1.07721   -0.845574   -0.87317
  1.50014    0.472251  -0.66321    -0.00962741
 -0.708748   1.50014    0.628075    0.0627294
  1.34772   -0.708748  -0.866241    0.201507

Let's plot the time series.

p_det = plot(xlabel = "", ylabel = "Value", title = "Coupled Henon maps")
plot!(x[1:100], label = "x", marker = stroke(:black), c = :black)
plot!(y[1:100], label = "y", marker = stroke(:red), c = :red)
p_rand = plot(xlabel = "Time step", ylabel = "Value", title = "Random time series")
plot!(xr[1:100], label = "xr", c = :blue)
plot!(yr[1:100], label = "yr", c = :purple)

plot(p_det, p_rand, layout = grid(2, 1), size = (382*2, 400), legend = :bottomright,
    tickfont = font(13), guidefont = font(13), legendfont = font(13))

Now we compute the S-measure between the random time series, both from x to y and from y to x. We'll also do the same for the Henon maps.

The test parameters are embedding dimensions (dx for the source and dy for the target), the embedding lags (τx for the source and τy for the target), and the number of nearest neighbors K. We'll compute the test with fixed embedding parameters, but a varying number of nearest neighbors (ks = 2:10).

For the sake of demonstration, we'll use the same embedding parameters both for the source and for the target, for all analyses. For a real use case, embedding parameters should be chosen more carefully, and will, in general, be different for the source and for the target.

dx, dy = 5, 5
τx, τy = 1, 1

# Compute the s-measures for different values of k
ks = 2:10
ss_r_xy = [s_measure(xr, yr, dx = dx, τx = τx, dy = dy, τy = τy, K = k) for k in ks]
ss_r_yx = [s_measure(yr, xr, dx = dx, τx = τx, dy = dy, τy = τy, K = k) for k in ks]
ss_henon_xy = [s_measure(x, y, dx = dx, τx = τx, dy = dy, τy = τy, K = k) for k in ks]
ss_henon_yx = [s_measure(y, x, dx = dx, τx = τx, dy = dy, τy = τy, K = k) for k in ks]

plot(xlabel = "# nearest neighbors (k)", ylabel = "S", ylims = (-0.05, 1.05))
plot!(ks, ss_r_xy,  label = "random uncoupled system (x -> y)", marker = stroke(2), c = :black)
plot!(ks, ss_r_yx,  label = "random uncoupled system (y -> x)", marker = stroke(2), c = :red)
plot!(ks, ss_henon_xy, marker = stroke(2), label = "henon unidir (x -> y)")
plot!(ks, ss_henon_yx, marker = stroke(2), label = "henon unidir (y -> x)")

For uncoupled time series, we expect the value of $S$ to be close to zero. For strongly coupled time series, the value of $S$ should be nonzero and approaching 1. This is exactly what we get: for time random time series, the value of $S$ is close to zero and for the Henon maps, it's clearly non-zero.

Note that the actual dynamical coupling in the Henon system is unidirectional from x to y. The results (positive $S$ in both directions), however, indicate that the coupling is bidirectional, with the coupling being stronger in one direction. This disagreement between results and ground truth highlights the importance of employing causal hypothesis testing. For this, we could use TimeseriesSurrogates.jl to generate surrogate time series. An in-depth example on how to use surrogate testing can be found in the transfer entropy example.

  • Quiroga2000Quian Quiroga, R., Arnhold, J. & Grassberger, P. [2000] “Learning driver-response relationships from synchronization patterns,” Phys. Rev. E61(5), 5142–5148.
  • Grassberger1999Arnhold, J., Grassberger, P., Lehnertz, K., & Elger, C. E. (1999). A robust method for detecting interdependences: application to intracranially recorded EEG. Physica D: Nonlinear Phenomena, 134(4), 419-430.