Binned resampling
#
CausalityToolsBase.causality
— Method.
causality(x::AbstractUncertainIndexValueDataset,
y::AbstractUncertainIndexValueDataset,
test::BinnedDataCausalityTest)
Apply a causality test to x
and y
, which are both data series with uncertainties in both indices and values. To get the data on an equally-spaced temporal grid, the data are first binned according to the instructions in test.binning
.
Binning methods
- If
test.binning
results in an uncertain value for each bin, then the causality
test is applied test.n_realizations
times. Examples are UncertainData.BinnedResampling
and UncertainData.BinnedWeightedResampling
, which both return a KDE estimate to the distribution of values in each bin.
- If
test.binning
returns a summary statistic for each bin, then the causality test is
applied to the summarised time series. Examples are UncertainData.BinnedMeanResampling
and UncertainData.BinnedMeanWeightedResampling
, which both return the mean of each bin.
Example
Some example data
Let's say we have the following datasets X
and Y
:
using CausalityTools, UncertainData
sys = ar1_unidir(uᵢ = [0.1, 0.1], c_xy = 0.41)
vars = (1, 2) # ar1_unidir has only two variables, X and Y
n_steps = 100 # the number of points in the time series
tstep = 10 # the mean of each time value is stepped by `tstep`
X, Y = example_uncertain_indexvalue_datasets(sys, n_steps, vars, tstep = 10,
d_xind = Uniform(7.5, 15.5),
d_yind = Uniform(5.5, 15.5),
d_xval = Uniform(0.1, 0.5));
Now we can perform the causality tests on our uncertain data, either over the bin means, or over an ensemble of independent realisations of the dataset given the distribution of points in each bin.
Summary statistic for each bin
PredictiveAsymmetryTest
If the binning method returns a summary statistic (e.g. the mean) for each bin, then the causality test is applied exactly once to the bin summaries.
# Bin the data by drawing n_draws = 5000 realizations of each uncertain data
# point, then assign the draws to the correct bins and get a distribution
# of values in each bin.
binning = BinnedResampling(0:10:1000, 5000)
# A predictive asymmetry test. For the predictions, we'll use a transfer entropy
# test, which uses the visitation frequency estimator. We'll predict 5 time steps
# forwards and backwards in time (ηs = -5:5). For the state space binning, we'll
# determine how many intervals to split each axis into from the number of points
# in the time series.
k, l, m = 1, 1, 1 # embedding parameters
n_subdivisions = floor(Int, length(0:10:1000)^(1/(k + l + m + 1)))
state_space_binning = RectangularBinning(n_subdivisions)
ηs = -5:5
te_test = VisitationFrequencyTest(k = k, l = l, m = m,
binning = state_space_binning,
ηs = ηs, b = 2) # use base-2 logarithms
pa_test = PredictiveAsymmetryTest(predictive_test = te_test)
# Compute transfer entropy in both directions for the bin means.
# We still have to specify the number of realisations for the test,
# but it is ignored, so it doesn't matter what number we input here.
test = BinnedDataCausalityTest(pa_test, binning, 1)
# Perform the tests
tes_xy = causality(X, Y, test)
tes_yx = causality(Y, X, test)
# Plot the results
plot(xlabel = "Prediction lag (eta)", ylabel = "Predictive asymmetry (bits)")
plot!(ηs[ηs .> 0], tes_xy, label = "x -> y (binned)", c = :black)
plot!(ηs[ηs .> 0], tes_yx, label = "y -> x (binned)", c = :red)
hline!([0], lw = 2, ls = :dot, α = 0.5, label = "", c = :grey)
Uncertain values representing each bin
If the binning method returns an uncertain value for each bin, then the causality test is applied to test.n_realizations
independent draws of the binned dataset.
Predictive asymmetry test
# Bin the data by drawing n_draws = 5000 realizations of each uncertain data
# point, then assign the draws to the correct bins and get a distribution
# of values in each bin.
grid = 0:5:1000
binning = BinnedResampling(grid, 5000)
# A predictive asymmetry test. For the predictions, we'll use a transfer entropy
# test, which uses the visitation frequency estimator. We'll predict 5 time steps
# forwards and backwards in time (ηs = -5:5). For the state space binning, we'll
# determine how many intervals to split each axis into from the number of points
# in the time series.
k, l, m = 1, 1, 1 # embedding parameters
n_subdivisions = floor(Int, length(0:10:1000)^(1/(k + l + m + 1)))
state_space_binning = RectangularBinning(n_subdivisions)
ηs = -5:5
te_test = VisitationFrequencyTest(k = k, l = l, m = m,
binning = state_space_binning,
ηs = ηs, b = 2) # use base-2 logarithms
pa_test = PredictiveAsymmetryTest(predictive_test = te_test)
# Compute transfer entropy in both directions over 50 independent realizations
# of the binned dataset.
n_realizations = 50
test = BinnedDataCausalityTest(pa_test, binning, n_realizations)
# Perform the tests on the bin means. `tes_xy` now contains 50 independent
# computation of the predictive asymmetry at predictive lags 1 to 5,
# and `tes_yx` contains the same but for the other direction
tes_xy = causality(X, Y, test)
tes_yx = causality(Y, X, test)
# Gather results in a matrix and compute means and standard deviations
# for the predictive asymmetries at each prediction lag
M_xy = hcat(tes_xy...,)
M_yx = hcat(tes_yx...,)
means_xy = mean(M_xy, dims = 2)[:, 1]
means_yx = mean(M_yx, dims = 2)[:, 1]
stdevs_xy = std(M_yx, dims = 2)[:, 1]
stdevs_yx = std(M_yx, dims = 2)[:, 1]
# Plot the predictive asymmetry as a function of prediction lag
plot(xlabel = "Prediction lag (eta)", ylabel = "Predictive asymmetry (bits)")
plot!(ηs[ηs .> 0], means_xy, ribbon = stdevs_xy, label = "x -> y (binned)", c = :black)
plot!(ηs[ηs .> 0], means_yx, ribbon = stdevs_yx, label = "y -> x (binned)", c = :red)
hline!([0], lw = 2, ls = :dot, α = 0.5, label = "", c = :grey)
Convergent cross mapping test
using Plots
# Bin the data by drawing n_draws = 5000 realizations of each uncertain data
# point, then assign the draws to the correct bins and get a distribution
# of values in each bin.
binning = BinnedResampling(0:10:1000, 5000)
# A convergent cross mapping causality test
ts_lengths = 15:5:100 |> collect
causalitytest = ConvergentCrossMappingTest(
timeseries_lengths = ts_lengths,
n_reps = 100,
libsize = 100,
τ = 1,
replace = true)
# Run ccm in both directions over 50 independent realizations of the binned datasets
n_realizations = 50
test = BinnedDataCausalityTest(causalitytest, binning, n_realizations)
ccms_xy = causality(X, Y, test);
ccms_yx = causality(Y, X, test);
realization_means_xy = [mean.(realization) for realization in ccms_xy]
realization_means_yx = [mean.(realization) for realization in ccms_yx]
M_xy = hcat(realization_means_xy...,)
M_yx = hcat(realization_means_yx...,)
plot(xlabel = "Time series length", ylabel = "Cross map skill")
plot!(ts_lengths, mean(M_xy, dims = 2)[:, 1],
ribbon = std(M_xy, dims = 2)[:, 1],
label = "x -> y (binned)")
plot!(ts_lengths, mean(M_yx, dims = 2)[:, 1],
ribbon = std(M_yx, dims = 2)[:, 1],
label = "y -> x (binned)")
BinnedDataCausalityTest
#
CausalityTools.CausalityTests.BinnedDataCausalityTest
— Type.
BinnedDataCausalityTest(test::CT, binning::AbstractBinnedUncertainValueResampling, n_realizations::Int)
BinnedDataCausalityTest(test::CT, binning::AbstractBinnedSummarisedResampling)
A causality test where the data are binned before applying the test. If binning
is some binning scheme that returns an uncertain value for each bin (e.g. BinnedResampling
or BinnedWeightedResampling
), then the test
is applied n_realizations
times. If binning
returns a single value for each bin (e.g. BinnedMeanResampling
or BinnedMeanWeightedResampling
, the test
is applied only once.
Fields
test::CausalityTest
. An instance of a causality test, e.g.VisitationFrequencyTest
,PredictiveAsymmetryTest
,CrossMappingTest
orJointDistanceDistributionTest
.binning::AbstractBinnedResampling
. An instance of a resampling scheme indicating the type of binning, e.gBinnedResampling
,BinnedWeightedResampling
,BinnedMeanResampling
orBinnedMeanWeightedResampling
.n_realizations::Int
: The number of independent draws of the binned data set over which to compute the causalitytest
. Only taken into consideration ofbinning
is a scheme resulting in each bin being represented by an uncertain value. If bins are summarized, then thetest
is always applied only once.
Examples
Binned convergent cross mapping test
Let's say we want to bin the data by drawing n_draws = 5000 realiastions of each uncertain data point, then assign the draws to the correct bins, and finally get a kernel density estimate to the distribution of values in each bin.
grid = 0:10:1000 # left bin edges
n_draw = 5000 # sample each point 5000 times and distribute among bins
binning = BinnedResampling(grid, 5000)
ccm_test = ConvergentCrossMappingTest(timeseries_lengths = 15:5:100 |> collect,
n_reps = 100, libsize = 100, τ = 1, replace = true)
test = BinnedDataCausalityTest(ccm_test, binning, n_realizations)
Binned transfer entropy test
Let's say we want to bin the data by drawing n_draws = 7500 realiastions of each uncertain data point, then assign the draws to the correct bins, and finally get a kernel density estimate to the distribution of values in each bin.
grid = 0:5:1000 # left bin edges
n_draws = 7500
binning = BinnedResampling(grid, n_draws)
# A transfer entropy causality test using the visitation frequency estimator
state_space_binning = RectangularBinning(5)
te_test = VisitationFrequencyTest(binning = state_space_binning, ηs = 1)
test = BinnedDataCausalityTest(te_test, binning, n_realizations)