Cross-map API

Estimators

Associations.CrossmapEstimator — Type

CrossmapEstimator{M<:CrossmapMeasure, LIBSIZES, RNG}

The abstract supertype for all cross-map estimators.

Concrete subtypes

RandomVectors
RandomSegment
ExpandingSegment

Description

Because the type of the library may differ between estimators, and because RNGs from different packages may be used, subtypes must implement the LIBSIZES and RNG type parameters.

For efficiency purposes, subtypes may contain mutable containers that can be re-used for ensemble analysis (see Ensemble).

Libraries

A cross-map estimator uses the concept of "libraries". A library is essentially just a reference to a set of points, and usually, a library refers to indices of points, not the actual points themselves.

For example, for timeseries, RandomVectors(libsizes = 50:25:100) produces three separate libraries, where the first contains 50 randomly selected time indices, the second contains 75 randomly selected time indices, and the third contains 100 randomly selected time indices. This of course assumes that all quantities involved can be indexed using the same time indices, meaning that the concept of "library" only makes sense after relevant quantities have been jointly embedded, so that they can be jointly indexed. For non-instantaneous prediction, the maximum possible library size shrinks with the magnitude of the index/time-offset for the prediction.

For spatial analyses (not yet implemented), indices could be more complex and involve multi-indices.

source

Associations.RandomVectors — Type

RandomVectors <: CrossmapEstimator
RandomVectors(definition::CrossmapMeasure; libsizes, replace = false, 
    rng = Random.default_rng())

Cross map once over N = length(libsizes) different "point libraries", where point indices are selected randomly (not considering time ordering).

This is method 3 from Luo et al. (2015). See CrossmapEstimator for an in-depth explanation of what "library" means in this context.

Description

The cardinality of the point libraries are given by libsizes. One set of random point indices is selected per L ∈ libsizes, and the i-th library has cardinality k = libsizes[i].

Point indices within each library are randomly selected, independently of other libraries. A user-specified rng may be specified for reproducibility. The replace argument controls whether sampling is done with or without replacement. If the time series you're cross mapping between have length M, and Lᵢ < M for any Lᵢ ∈ libsizes, then you must set replace = true.

Returns

The return type when used with association depends on the type of libsizes.

If libsizes is an Int (a single library), then a single cross-map estimate is returned.
If libsizes is an AbstractVector{Int} (multiple libraries), then a vector of cross-map estimates is returned –- one per library.

Advanced utility methods

For most use cases, it is sufficient to provide a CrossmapEstimator to association to compute a cross map measure. However, in some cases it can be useful to have more fine-grained controls. We offer a few utility functions for this purpose.

These functions are used in the examples below, where we reproduce Figures 3C and 3D of Sugihara et al. (2012) and reproduce figures from McCracken and Weigel (2014).

Associations.predict — Function

predict(measure::CrossmapEstimator, t::AbstractVector, s::AbstractVector) → t̂ₛ, t̄, ρ
predict(measure::CrossmapEstimator, t̄::AbstractVector, S̄::AbstractStateSpaceSet) → t̂ₛ

Perform point-wise cross mappings between source embeddings and target time series according to the algorithm specified by the given cross-map measure (e.g. ConvergentCrossMapping or PairwiseAsymmetricInference).

First method: Jointly embeds the target t and source s time series (according to measure) to obtain time-index aligned target timeseries t̄ and source embedding S̄ (which is now a StateSpaceSet). Then calls predict(measure, t̄, S̄) (the first method), and returns both the predictions t̂ₛ, observations t̄ and their correspondence ρ according to measure.
Second method: Returns a vector of predictions t̂ₛ (t̂ₛ := "predictions of t̄ based on source embedding S̄"), where t̂ₛ[i] is the prediction for t̄[i]. It assumes pre-embedded data which have been correctly time-aligned using a joint embedding, i.e. such that t̄[i] and S̄[i] correspond to the same time index.

Description

For each i ∈ {1, 2, …, N} where N = length(t) == length(s), we make the prediction t̂[i] (an estimate of t[i]) based on a linear combination of D + 1 other points in t, where the selection of points and weights for the linear combination are determined by the D+1 nearest neighbors of the point S̄[i]. The details of point selection and weights depend on measure.

Note: Some CrossmapMeasures may define more general mapping procedures. If so, the algorithm is described in their docstring.

source

Associations.crossmap — Function

crossmap(measure::CrossmapEstimator, t::AbstractVector, s::AbstractVector) → ρ::Real
crossmap(measure::CrossmapEstimator, est, t::AbstractVector, s::AbstractVector) → ρ::Vector
crossmap(measure::CrossmapEstimator, t̄::AbstractVector, S̄::AbstractStateSpaceSet) → ρ

Compute the cross map estimates between between raw time series t and s (and return the real-valued cross-map statistic ρ). If a CrossmapEstimator est is provided, cross mapping is done on random subsamples of the data, where subsampling is dictated by est (a vector of values for ρ is returned).

Alternatively, cross-map between time-aligned time series t̄ and source embedding S̄ that have been constructed by jointly (pre-embedding) some input data.

This is just a wrapper around predict that simply returns the correspondence measure between the source and the target.

source

Example: reproducing Sugihara et al. (2012)

Run blocks consecutively

If copying these examples and running them locally, make sure the relevant packages (given in the first block) are loaded first.

Figure 3A

Let's reproduce figure 3A too, focusing only on ConvergentCrossMapping this time. In this figure, they compute the cross mapping for libraries of increasing size, always starting at time index 1. This approach - which we here call the ExpandingSegment estimator - is one of many ways of estimating the correspondence between observed and predicted value.

For this example, they use a bidirectional system with asymmetrical coupling strength.

using Associations
using Statistics
using LabelledArrays
using StaticArrays
using DynamicalSystemsBase
using StateSpaceSets
using CairoMakie, Printf

function eom_logistic_sugi(u, p, t)
    (; rx, ry, βxy, βyx) = p
    (; x, y) = u

    dx = x*(rx - rx*x - βxy*y)
    dy = y*(ry - ry*y - βyx*x)
    return SVector{2}(dx, dy)
end

# βxy := effect on x of y
# βyx := effect on y of x
function logistic_sugi(; u0 = rand(2), rx, ry, βxy, βyx)
    p = @LArray [rx, ry, βxy, βyx] (:rx, :ry, :βxy, :βyx)
    DiscreteDynamicalSystem(eom_logistic_sugi, u0, p)
end

# Used in `reproduce_figure_3A_naive`, and `reproduce_figure_3A_ensemble` below.
function add_to_fig!(fig_pos, libsizes, ρs_x̂y, ρs_ŷx; title = "", quantiles = false)
    ax = Axis(fig_pos; title, aspect = 1,
        xlabel = "Library size", ylabel = "Correlation (ρ)")
    ylims!(ax, (-1, 1))
    hlines!([0], linestyle = :dash, alpha = 0.5, color = :grey)
    scatterlines!(libsizes, median.(ρs_x̂y), label = "x̂|y", color = :blue)
    scatterlines!(libsizes, median.(ρs_ŷx), label = "ŷ|x", color = :red)
    if quantiles
        band!(libsizes, quantile.(ρs_x̂y, 0.05), quantile.(ρs_x̂y, 0.95), color = (:blue, 0.5))
        band!(libsizes, quantile.(ρs_ŷx, 0.05), quantile.(ρs_ŷx, 0.95), color = (:red, 0.5))
    end
    axislegend(ax, position = :rb)
end

function reproduce_figure_3A_naive(definition::CrossmapMeasure)
    sys_bidir = logistic_sugi(; u0 = [0.2, 0.4], rx = 3.7, ry = 3.700001, βxy = 0.02, βyx = 0.32);
    x, y = columns(first(trajectory(sys_bidir, 3100, Ttr = 10000)));
    libsizes = [20:2:50; 55:5:200; 300:50:500; 600:100:900; 1000:500:3000]
    est = ExpandingSegment(definition; libsizes);
    ρs_x̂y = crossmap(est, x, y)
    ρs_ŷx = crossmap(est, y, x)

    with_theme(theme_minimal(),
        markersize = 5) do
        fig = Figure(resolution = (800, 300))
        add_to_fig!(fig[1, 1], libsizes, ρs_x̂y, ρs_ŷx; title = "`ExpandingSegment`")
        fig
    end
end

reproduce_figure_3A_naive(ConvergentCrossMapping(d = 3))

Hm. This looks a bit like the paper, but the curve is not smooth. We can do better!

It is not clear from the paper exactly what they plot in their Figure 3A, if they plot an average of some kind, or precisely what parameters and initial conditions they use. However, we can get a smoother plot by using a Ensemble. Combined with a CrossmapEstimator, it uses Monte Carlo resampling on subsets of the input data to compute an ensemble of ρs that we here use to compute the median and 90-th percentile range for each library size.

function reproduce_figure_3A_ensemble(definition::CrossmapMeasure)
    sys_bidir = logistic_sugi(; u0 = [0.4, 0.2], rx = 3.8, ry = 3.5, βxy = 0.02, βyx = 0.1);
    x, y = columns(first(trajectory(sys_bidir, 5000, Ttr = 10000)));
    # Note: our time series are 1000 points long. When embedding, some points are
    # lost, so we must use slightly less points for the segments than
    # there are points in the original time series.
    libsizes = [20:5:50; 55:5:200; 300:50:500; 600:100:900; 1000:500:2000]
    # No point in doing more than one rep, because there data are always the same
    # for `ExpandingSegment.`
    ensemble_ev = Ensemble(ExpandingSegment(definition; libsizes); nreps = 1)
    ensemble_rs = Ensemble(RandomSegment(definition; libsizes); nreps = 30)
    ensemble_rv = Ensemble(RandomVectors(definition; libsizes); nreps = 30)
    ρs_x̂y_es = crossmap(ensemble_ev, x, y)
    ρs_ŷx_es = crossmap(ensemble_ev, y, x)
    ρs_x̂y_rs = crossmap(ensemble_rs, x, y)
    ρs_ŷx_rs = crossmap(ensemble_rs, y, x)
    ρs_x̂y_rv = crossmap(ensemble_rv, x, y)
    ρs_ŷx_rv = crossmap(ensemble_rv, y, x)

    with_theme(theme_minimal(),
        markersize = 5) do
        fig = Figure(resolution = (800, 300))
        add_to_fig!(fig[1, 1], libsizes, ρs_x̂y_es, ρs_ŷx_es; title = "`ExpandingSegment`", quantiles = false) # quantiles make no sense for `ExpandingSegment`
        add_to_fig!(fig[1, 2], libsizes, ρs_x̂y_rs, ρs_ŷx_rs; title = "`RandomSegment`", quantiles = true)
        add_to_fig!(fig[1, 3], libsizes, ρs_x̂y_rv, ρs_ŷx_rv; title = "`RandomVector`", quantiles = true)
        fig
    end
end

reproduce_figure_3A_ensemble(ConvergentCrossMapping(d = 3, τ = -1))

With the RandomVectors estimator, the mean of our ensemble ρs seem to look pretty much identical to Figure 3A in Sugihara et al. The RandomSegment estimator also performs pretty well, but since subsampled segments are contiguous, there are probably some autocorrelation effects at play.

We can avoid the autocorrelation issue by tuning the w parameter of the ConvergentCrossMapping measure, which is the Theiler window. Setting the Theiler window to w > 0, we can exclude neighbors of a query point p that are close to p in time, and thus deal with autocorrelation issues that way (the default w = 0 excludes only the point itself). Let's re-do the analysis with w = 5, just for fun.

reproduce_figure_3A_ensemble(ConvergentCrossMapping(d = 3, τ = -1, w = 5))

There wasn't really that much of a difference, since for the logistic map, the autocorrelation function flips sign for every lag increase. However, for examples from other systems, tuning w may be important.

Figure 3B

What about figure 3B? Here they generate time series of length 400 for a range of values for both coupling parameters, and plot the dominant direction $\Delta = \rho(\hat{x} | y) - \rho(\hat{y} | x)$.

In the paper, they use a 1000 different parameterizations for the logistic map parameters, but don't state what is summarized in the plot. For simplicity, we'll therefore just stick to rx = ry = 3.7, as in the examples above, and just loop over the coupling strengths in either direction.

function reproduce_figure_3B()
    βxys = 0.0:0.02:0.4
    βyxs = 0.0:0.02:0.4
    ρx̂ys = zeros(length(βxys), length(βyxs))
    ρŷxs = zeros(length(βxys), length(βyxs))

    for (i, βxy) in enumerate(βxys)
        for (j, βyx) in enumerate(βyxs)
            sys_bidir = logistic_sugi(; u0 = [0.2, 0.4], rx = 3.7, ry = 3.7, βxy, βyx);
            # Generate 1000 points. Randomly select a 400-pt long segment.
            x, y = columns(first(trajectory(sys_bidir, 400, Ttr = 10000)));
            definition = CCM(d = 3, w = 5, τ = -1)
            ensemble = Ensemble(RandomVectors(definition; libsizes = 100), nreps = 50)
            ρx̂ys[i, j] = mean(crossmap(ensemble, x, y))
            ρŷxs[i, j] = mean(crossmap(ensemble, y, x))
        end
    end
    Δ = ρŷxs .- ρx̂ys

    with_theme(theme_minimal(),
        markersize = 5) do
        fig = Figure();
        ax = Axis(fig[1, 1], xlabel = "βxy", ylabel = "βyx")
        cont = contourf!(ax, Δ, levels = range(-1, 1, length = 10),
            colormap = :curl)
        ax.xticks = 1:length(βxys), string.([i % 2 == 0 ? βxys[i] : "" for i in 1:length(βxys)])
        ax.yticks = 1:length(βyxs), string.([i % 2 == 0 ? βyxs[i] : "" for i in 1:length(βyxs)])
        Colorbar(fig[1 ,2], cont, label = "Δ (ρ(ŷ|x) - ρ(x̂|y))")
        tightlimits!(ax)
        fig
    end
end

reproduce_figure_3B()

Figures 3C and 3D

Let's reproduce figures 3C and 3D in Sugihara et al. (2012), which introduced the ConvergentCrossMapping measure. Equations and parameters can be found in their supplementary material. Simulatenously, we also compute the PairwiseAsymmetricInference measure from McCracken and Weigel (2014), which is a related method, but uses a slightly different embedding.

using Associations
using Statistics
using LabelledArrays
using StaticArrays
using DynamicalSystemsBase
using StateSpaceSets
using CairoMakie, Printf

# -----------------------------------------------------------------------------------------
# Create 500-point long time series for Sugihara et al. (2012)'s example for figure 3.
# -----------------------------------------------------------------------------------------
sys_unidir = logistic_sugi(; u0 = [0.2, 0.4], rx = 3.7, ry = 3.700001, βxy = 0.00, βyx = 0.32);
x, y = columns(first(trajectory(sys_unidir, 500, Ttr = 10000)));

# -----------------------------------------------------------------------------------------
# Cross map.
# -----------------------------------------------------------------------------------------
m_ccm = ConvergentCrossMapping(d = 2)
m_pai = PairwiseAsymmetricInference(d = 2)
# Make predictions x̂y, i.e. predictions `x̂` made from embedding of y (AND x, if PAI)
t̂ccm_x̂y, tccm_x̂y, ρccm_x̂y = predict(m_ccm, x, y)
t̂pai_x̂y, tpai_x̂y, ρpai_x̂y = predict(m_pai, x, y);
# Make predictions ŷx, i.e. predictions `ŷ` made from embedding of x (AND y, if PAI)
t̂ccm_ŷx, tccm_ŷx, ρccm_ŷx = predict(m_ccm, y, x)
t̂pai_ŷx, tpai_ŷx, ρpai_ŷx = predict(m_pai, y, x);

# -----------------------------------------------------------------------------------------
# Plot results
# -----------------------------------------------------------------------------------------
ρs = (ρccm_x̂y, ρpai_x̂y, ρccm_ŷx, ρpai_ŷx)
sccm_x̂y, spai_x̂y, sccm_ŷx, spai_ŷx = (map(ρ -> (@sprintf "%.3f" ρ), ρs)...,)

ρs = (ρccm_x̂y, ρpai_x̂y, ρccm_ŷx, ρpai_ŷx)
sccm_x̂y, spai_x̂y, sccm_ŷx, spai_ŷx = (map(ρ -> (@sprintf "%.3f" ρ), ρs)...,)

with_theme(theme_minimal(),
    markersize = 5) do
    fig = Figure();
    ax_ŷx = Axis(fig[2,1], aspect = 1, xlabel = "y(t) (observed)", ylabel = "ŷ(t) | x (predicted)")
    ax_x̂y = Axis(fig[2,2], aspect = 1, xlabel = "x(t) (observed)", ylabel = "x̂(t) | y (predicted)")
    xlims!(ax_ŷx, (0, 1)), ylims!(ax_ŷx, (0, 1))
    xlims!(ax_x̂y, (0, 1)), ylims!(ax_x̂y, (0, 1))
    ax_ts = Axis(fig[1, 1:2], xlabel = "Time (t)", ylabel = "Value")
    scatterlines!(ax_ts, x[1:300], label = "x")
    scatterlines!(ax_ts, y[1:300], label = "y")
    axislegend()
    scatter!(ax_ŷx, tccm_ŷx, t̂ccm_ŷx, label = "CCM (ρ = $sccm_ŷx)", color = :black)
    scatter!(ax_ŷx, tpai_ŷx, t̂pai_ŷx, label = "PAI (ρ = $spai_ŷx)", color = :red)
    axislegend(ax_ŷx, position = :lt)
    scatter!(ax_x̂y, tccm_x̂y, t̂ccm_x̂y, label = "CCM (ρ = $sccm_x̂y)", color = :black)
    scatter!(ax_x̂y, tpai_x̂y, t̂pai_x̂y, label = "PAI (ρ = $spai_x̂y)", color = :red)
    axislegend(ax_x̂y, position = :lt)
    fig
end

Example: reproducing McCracken & Weigel (2014)

Let's try to reproduce figure 8 from McCracken and Weigel (2014)'s paper on PairwiseAsymmetricInference (PAI). We'll start by defining the their example B (equations 6-7). This system consists of two variables $X$ and $Y$, where $X$ drives $Y$.

After we have computed the PAI in both directions, we define a measure of directionality as the difference between PAI in the $X \to Y$ direction and in the $Y \to X$ direction, so that if $X$ drives $Y$, then $\Delta < 0$.

using Associations
using LabelledArrays
using StaticArrays
using DynamicalSystemsBase
using StateSpaceSets
using CairoMakie, Printf
using Distributions: Normal
using Statistics: mean, std

function eom_nonlinear_sindriver(dx, x, p, n)
    a, b, c, t, Δt = (p...,)
    x, y = x[1], x[2]
    𝒩 = Normal(0, 1)

    dx[1] = sin(t)
    dx[2] = a*x * (1 - b*x) + c* rand(𝒩)
    p[end-1] += 1 # update t

    return
end

function nonlinear_sindriver(;u₀ = rand(2), a = 1.0, b = 1.0, c = 2.0, Δt = 1)
    DiscreteDynamicalSystem(eom_nonlinear_sindriver, u₀, [a, b, c, 0, Δt])
end

function reproduce_figure_8_mccraken(;
        c = 2.0, Δt = 0.2,
        as = 0.5:0.5:5.0,
        bs = 0.5:0.5:5.0)
    # -----------------------------------------------------------------------------------------
    # Generate many time series for many different values of the parameters `a` and `b`,
    # and compute PAI. This will replicate the upper right panel of
    # figure 8 in McCracken & Weigel (2014).
    # -----------------------------------------------------------------------------------------

    measure = PairwiseAsymmetricInference(d = 3)

    # Manually resample `nreps` length-`L` time series and use mean ρ(x̂|X̄y) - ρ(ŷ|Ȳx)
    # for each parameter combination.
    nreps = 50
    L = 200 # length of timeseries
    Δ = zeros(length(as), length(bs))
    for (i, a) in enumerate(as)
        for (j, b) in enumerate(bs)
            s = nonlinear_sindriver(; a, b, c,  Δt)
            x, y = columns(first(trajectory(s, 1000, Ttr = 10000)))
            Δreps = zeros(nreps)
            for i = 1:nreps
                # Ensure we're subsampling at the same time indices.
                ind_start = rand(1:(1000-L))
                r = ind_start:(ind_start + L)
                Δreps[i] = @views crossmap(measure, y[r], x[r]) -
                    crossmap(measure, x[r], y[r])
            end
            Δ[i, j] = mean(Δreps)
        end
    end

    # -----------------------------------------------------------------------------------------
    # An example time series for plotting.
    # -----------------------------------------------------------------------------------------
    sys = nonlinear_sindriver(; a = 1.0, b = 1.0, c, Δt)
    npts = 500
    orbit = first(trajectory(sys, npts, Ttr = 10000))
    x, y = columns(orbit)
    with_theme(theme_minimal(),
        markersize = 5) do

        X = x[1:300]
        Y = y[1:300]
        fig = Figure();
        ax_ts = Axis(fig[1, 1:2], xlabel = "Time (t)", ylabel = "Value")
        scatterlines!(ax_ts, (X .- mean(X)) ./ std(X), label = "x")
        scatterlines!(ax_ts, (Y .- mean(Y)) ./ std(Y), label = "y")
        axislegend()

        ax_hm = Axis(fig[2, 1:2], xlabel = "a", ylabel = "b")
        ax_hm.yticks = (1:length(as), string.([i % 2 == 0 ? as[i] : "" for i = 1:length(as)]))
        ax_hm.xticks = (1:length(bs), string.([i % 2 == 0 ? bs[i] : "" for i = 1:length(bs)]))
        hm = heatmap!(ax_hm, Δ,  colormap = :viridis)
        Colorbar(fig[2, 3], hm; label = "Δ' = ρ(ŷ | yx) - ρ(x̂ | xy)")
        fig
    end
end

reproduce_figure_8_mccraken()

We haven't used as many parameter combinations as McCracken and Weigel (2014) did, but we get a figure that looks roughly similar to theirs.

As expected, $\Delta < 0$ for all parameter combinations, implying that $X$ "PAI drives" $Y$.