Low-level transfer entropy estimators
For complete control over the estimation procedure, the analyst must create a delay reconstruction from the input data, specify a discretization scheme which can be either rectangular or triangulated, and map variables of the delay reconstruction to the correct marginals.
The package provides some convenience methods to compute TE directly from time series, see the wrappers for regular TE and conditional TE. However, be absolutely sure that you understand what they do before applying them to real problems.
Estimators
Valid estimator types for rectangular partitions are
VisitationFrequency
. An implementation of the original TE estimator from Schreiber (2000)1TransferOperatorGrid
. An implementation of the transfer operator grid estimator from Diego et al. (2019)2
General workflow for TE estimation
The general workflow for estimating transfer entropy over rectangular partitions is as follows.
Rectangular partitions
To estimate transfer entropy over rectangular partitions, you would use the following method, providing either a VisitationFrequency
or TransferOperatorGrid
instance to the estimator
argument.
#
TransferEntropy.transferentropy
— Method.
transferentropy(pts, vars::TEVars, ϵ::RectangularBinning,
estimator::TransferEntropyEstimator) -> Float64
Compute the transfer entropy for a set of pts
over the state space partition specified by ϵ
(a RectangularBinning
instance).
Fields
pts
: An ordered set ofm
-dimensional points (pts
) representing an appropriate generalised embedding of some data series. Must be vector of states, not a vector of variables/time series. Wrap your time series in aDynamicalSystemsBase.Dataset
first if the latter is the case.vars::TEVars
: ATEVars
instance specifying how them
different variables ofpts
are to be mapped into the marginals required for transfer entropy computation.ϵ::RectangularBinning
: ARectangularBinning
instance that dictates how the point cloud (state space reconstruction) should be discretized.estimator::TransferEntropyEstimator
. There are different ways of computing the transfer entropy over a discretization of a point cloud. Theestimator
should be a validTransferEntropyEstimator
that works for rectangular partitions, for exampleVisitationFrequency()
orTransferOperatorGrid()
. The fieldestimator.b
sets the base of the logarithm used for the computations (e.gVisitationFrequency(b = 2)
computes the transfer entropy in bits using the VisitationFrequency estimator).
Returns
A single number that is the transfer entropy for the discretization of pts
using the binning scheme ϵ
, using the provided estimator
.
Long example
using TransferEntropy, DynamicalSystems
1. Generate some example time series
Let's generate some random noise series and use those as our time series.
x = rand(100)
y = rand(100)
We need pts
to be a vector of states. Therefore, collect the time series in a DynamicalSystems.Dataset
instance. This way, the states of the composite system will be represented as a Vector{SVector}
.
raw_timeseries = Dataset(x, y)
2. Generalised embedding
Note: If your data are already organised in a form of a generalised embedding, where columns of the dataset correspond to lagged variables of the time series, you can skip to step 3.
Say we want to compute transfer entropy from x to y, and that we require a 4-dimensional embedding. For that, we need to decide on a generalised state space reconstruction of the time series. One possible choice is
If so, we're computing the following TE
To create the embedding, we'll use the customembed
function (check its documentation for a detailed explanation on how it works). This is basically just making lagged copies of the time series, and stacking them next to each other as column vectors. The order in which we arrange the lagged time series is not important per se, but we need to keep track of the ordering, because that information is crucial to the transfer entropy estimator.
According to the reconstruction we decided on above, we need to put the lagged time series for y
(the target variable) in the first three columns. The lags for those columns are η, 0, -τ
, in that order. Next, we need to put the time series for x
(the source variable) in the fourth column (which is not lagged).
τ = optimal_delay(y) # find the optimal embedding lag
η = 2 # prediction lag (your choice as an analyst)
embedding_pts = customembed(raw_timeseries, Positions(2, 2, 2, 1), Lags(η, 0, -τ, 0))
The combination of the Positions
instance and the Lags
instance gives us the necessary information about which time series in raw_timeseries
that corresponds to columns of the embedded dataset, and which lags each of the columns have.
3. Instructions to the estimator
The transfer entropy estimators needs the following about the columns of the generalised reconstruction of your time series (embedding_pts
in our case):
- Which columns correspond to the future of the target variable (T_f`)?
- Which columns correspond to the present and past of the target variable (T_pp`)?
- Which columns correspond to the present and past of the source variable (S_pp`)?
- Which columns correspond to the present/past/future of any variables that we are to condition on (C_pp`)?
This information is needed to ensure that marginals are properly assigned during transfer entropy computation. The estimators accept this information in the form of a TEVars
instance, which can be constructed like so:
vars = TEVars(Tf = [1], Tpp = [2, 3], Spp = [4])
4. Rectangular grid specification
Entropy is essentially a property of a collection of states. To meaningfully talk about states for our generalised state space reconstruction, we will divide the coordinate axes of the reconstruction into a rectangular grid. Each box in the grid will be considered a state, and the probability of visitation is equally distributed within the box.
In this example, we'll use a rectangular partition where the box sizes are determined by splitting each coordinate axis into 6 equally spaced intervals, spanning the range of the data.
binning = RectangularBinning(6)
5. Compute transfer entropy
Now we're ready to compute transfer entropy. First, let's use the VisitationFrequency
estimator with logarithm to the base 2. This gives the transfer entropy in units of bits.
te_vf = transferentropy(embedding_pts, vars, binning, VisitationFrequency(b = 2)) #, or
Okay, but what if we want to use another estimator and want the transfer entropy in units of nats? Easy.
transferentropy(embedding_pts, vars, binning, TransferOperatorGrid(b = Base.MathConstants.e))
Above, we computed transfer entropy for one particular choice of partition. Transfer entropy is a function of the partition, and care must be taken with the choice of partition. Below is an example where we compute transfer entropy over 15 different cubic grids spanning the range of the data, with differing box sizes all having fixed edge lengths (logarithmically spaced from 0.001 to 0.3).
# Box sizes
ϵs = 10 .^ range(log(10, 0.001), log10(0.3), length = 15)
tes = map(ϵ -> transferentropy(embedding_pts, vars, RectangularBinning(ϵ), VisitationFrequency(b = 2)), ϵs)
tes
now contains 15 different values of the transfer entropy, one for each of the discretization schemes. For the smallest bin sizes, the transfer entropy is close to or equal to zero, because there are not enough points distributed among the bins (of which there are many when the box edge length is small).
Triangulated partitions
Estimators for computing TE on triangulated partitions, whose invariant distribution is obtained through the transfer operator, was also introduced in Diego et al. (2019)2.
#
TransferEntropy.transferentropy
— Method.
transferentropy(μ::AbstractTriangulationInvariantMeasure, vars::TEVars,
binning_scheme::RectangularBinning;
estimator = VisitationFrequency(), n::Int = 10000)
Transfer entropy using a precomputed invariant measure over a triangulated partition
Estimate transfer entropy from an invariant measure over a triangulation that has been precomputed either as
μ = invariantmeasure(pts, TriangulationBinning(), ApproximateIntersection())
, orμ = invariantmeasure(pts, TriangulationBinning(), ExactIntersection())
where the first method uses approximate simplex intersections (faster) and the second method uses exact simplex intersections (slow). μ
contains all the information needed to compute transfer entropy.
Note: pts
must be a vector of states, not a vector of variables/(time series). Wrap your time series in a Dataset
first if the latter is the case.
Computing transfer entropy (triangulation -> rectangular partition)
Because we need to compute marginals, we need a rectangular grid. To do so, transfer entropy is computed by sampling the simplices of the triangulation according to their measure with a total of approximately n
points. Introducing multiple points as representatives for the partition elements does not introduce any bias, because we in computing the invariant measure, we use no more information than what is encoded in the dynamics of the original data points. However, from the invariant measure, we can get a practically infinite amount of points to estimate transfer entropy from.
Then, transfer entropy is estimated using the visitation frequency estimator on those points (see docs for transferentropy_visitfreq
for more information), on a rectangular grid specified by binning_scheme
.
Common use case
This method is good to use if you want to explore the sensitivity of transfer entropy to the bin size in the final rectangular grid, when you have few observations in the time series. The invariant measure, which encodes the dynamical information, is slow to compute over the triangulation, but only needs to be computed once. After that, transfer entropy may be estimated at multiple scales very quickly.
Example
# Assume these points are an appropriate delay embedding {(x(t), y(t), y(t+1))} and
# that we're measure transfer entropy from x -> y.
pts = invariantize([rand(3) for i = 1:30])
v = TEVars(Tf = [3], Tpp = [2], Spp = [1])
# Compute invariant measure over a triangulation using approximate
# simplex intersections. This is relatively slow.
μ = invariantmeasure(pts, TriangulationBinning(), ApproximateIntersection())
# Compute transfer entropy from the invariant measure over multiple
# bin sizes. This is fast, because the measure has been precomputed.
tes = map(ϵ -> transferentropy(μ, v, RectangularBinning(ϵ)), 2:50)