

Convenience functions for TE estimation

TE estimation between two data series

# CausalityTools.te_reg — Function.

te_reg(source::AbstractArray{<:Real, 1}, 
    response::AbstractArray{<:Real, 1}, 
    k::Int, l::Int, m::Int; 
    η = 1, τ = 1, 
    estimator = VisitationFrequency(), 
    n_subdivs = 1,
    b = 2)

TE estimation with default discretization scheme(s)

Calculate transfer entropy from source to response using the provided estimator on a rectangular discretization of a k + l + m-dimensional delay embedding of the input data, using an embedding delay of τ across all embedding components. η is the prediction lag.

Arguments

source: The source data series.
target: The target data series.
k: The dimension of the $T_{f}$ component of the embedding.
l: The dimension of the $T_{pp}$ component of the embedding.
m: The dimension of the $S_{pp}$ component of the embedding.

Keyword arguments

τ: The embedding lag. Default is τ = 1.
η: The prediction lag. Default is η = 1.
estimator: The transfer entropy estimator to use. The default is VisitationFrequency().
n_subdivs: The number of different partitions of varying coarseness to compute TE over, as described below. Default is n_subdivs = 2. (this way, TE is computed over two separate partitions). Unless n_subdivs = 0, make sure that n_subdivs is the same across analyses if they are to be compared. T TE
b: Base of the logarithm. The default (b = 2) gives the TE in bits.

More about the embedding

To compute transfer entropy, we need an appropriate delay embedding of source ( $S$ ) and target ( $T$ ). For convenience, define

$\begin{align} T_f^{(k)} &= \{(T(t+\eta_k), \ldots, T(t+\eta_2), T(t+\eta_1)) \} \\ T_{pp}^{(l)} &= \{ (T(t), T(t-\tau_1), T(t-\tau_2), \ldots, T(t - \tau_{l - 1})) \} \\ S_{pp}^{(m)} &= \{ (S(t), S(t-\tau_1), S(t-\tau_2), \ldots, S(t-\tau_{m - 1})) \} \end{align}$

where $T_f$ denotes the k-dimensional set of vectors furnishing the future states of $T$ , $T_{pp}$ denotes the l-dimensional set of vectors furnishing the past and present states of $T$ , and $S_{pp}$ denotes the m-dimensional set of vectors furnishing the past and present of $S$ . $\eta$ is the prediction lag. This convenience function uses $\tau_1$ = τ, $\tau_2$ = 2*τ, $\tau_3$ = 3*τ, and so on.

Combined, we get the generalised embedding $\mathbb{E} = (T_f^{(k)}, T_{pp}^{(l)}, S_{pp}^{(m)})$ , which is discretized as described below.

More about discretization

To compute TE, we coarse-grain the k+l+m-dimensional generalised embedding $\mathbb{E} = (T_f^{(k)}, T_{pp}^{(l)}, S_{pp}^{(m)})$ into hyperrectangular boxes. The magnitude of the TE may be biased by the particular choice of binning scheme, so we compute TE across a number of different box sizes, determined as follows.

Let $L$ be the number of observations in $S$ (and $T$ ). The coarsest box size is determined by subdiving the i-th coordinate axis into $N = ceiling(L^\frac{1}{k + l + m + 1})$ intervals of equal lengths, resulting in a box size of $|max(dim_{i}) - min(dim_{i})|/N$ . The next box size is given by $|max(dim_{i}) - min(dim_{i})|/(N+1)$ , then $|max(dim_{i}) - min(dim_{i})|/(N+2)$ , and so on, until the finest box size which is given by $|max(dim_{i}) - min(dim_{i})|/(N+N_{subdivs})$ .

Transfer entropy computation

TE is then computed as

$\begin{align} TE_{S \rightarrow T} = \int_{\mathbb{E}} P(T_f, T_{pp}, S_{pp}) \log_{b}{\left(\frac{P(T_f | T_{pp}, S_{pp})}{P(T_f | T_{pp})}\right)} \end{align}$

using the provided estimator (default = VisitationFrequency) for each of the discretizations. A vector of the TE estimates for each discretization is returned.

source

te_reg(source::AbstractArray{<:Real, 1}, 
    response::AbstractArray{<:Real, 1}, 
    k::Int, l::Int, m::Int,
    binning_scheme::Vector{RectangularBinning}; 
    η = 1, τ = 1, 
    estimator = VisitationFrequency(), 
    b = 2)

TE with user-provided discretization scheme(s)

Calculate transfer entropy from source to response using the provided estimator on discretizations constructed by the provided binning_scheme(s) over a k + l + m-dimensional delay embedding of the input data, using an embedding delay of τ across all embedding components. η is the prediction lag.

Arguments

source: The source data series.
target: The target data series.
k: The dimension of the $T_{f}$ component of the embedding.
l: The dimension of the $T_{pp}$ component of the embedding.
m: The dimension of the $S_{pp}$ component of the embedding.
binning_scheme: The binning scheme(s) used to construct the partitions over which TE is computed. Must be either one or several instances of RectangularBinnings (provided as a vector). TE is computed for each of the resulting partitions.

Keyword arguments

τ: The embedding lag. Default is τ = 1.
η: The prediction lag. Default is η = 1.
estimator: The transfer entropy estimator to use. The default is VisitationFrequency().
b: Base of the logarithm. The default (b = 2) gives the TE in bits.

More about the embedding

To compute transfer entropy, we need an appropriate delay embedding of source ( $S$ ) and target ( $T$ ). For convenience, define

$\begin{align} T_f^{(k)} &= \{(T(t+\eta_k), \ldots, T(t+\eta_2), T(t+\eta_1)) \} \\ T_{pp}^{(l)} &= \{ (T(t), T(t-\tau_1), T(t-\tau_2), \ldots, T(t - \tau_{l - 1})) \} \\ S_{pp}^{(m)} &= \{ (S(t), S(t-\tau_1), S(t-\tau_2), \ldots, S(t-\tau_{m - 1})) \} \end{align}$

where $T_f$ denotes the k-dimensional set of vectors furnishing the future states of $T$ , $T_{pp}$ denotes the l-dimensional set of vectors furnishing the past and present states of $T$ , and $S_{pp}$ denotes the m-dimensional set of vectors furnishing the past and present of $S$ . $\eta$ is the prediction lag. This convenience function uses $\tau_1$ = τ, $\tau_2$ = 2*τ, $\tau_3$ = 3*τ, and so on.

Combined, we get the generalised embedding $\mathbb{E} = (T_f^{(k)}, T_{pp}^{(l)}, S_{pp}^{(m)})$ , which is discretized as described below.

More about discretization

The discretization scheme must be either a single RectangularBinning instance, or a vector of RectangularBinning instances. Run ?RectangularBinning after loading CausalityTools for details.

Transfer entropy computation

TE is then computed as

$\begin{align} TE_{S \rightarrow T} = \int_{\mathbb{E}} P(T_f, T_{pp}, S_{pp}) \log_{b}{\left(\frac{P(T_f | T_{pp}, S_{pp})}{P(T_f | T_{pp})}\right)} \end{align}$

using the provided estimator (default = VisitationFrequency) for each of the discretizations. A vector of the TE estimates for each discretization is returned.

source

TE estimation between two data series conditioned on third series

# CausalityTools.te_cond — Function.

te_cond(source::AbstractArray{<:Real, 1}, 
    response::AbstractArray{<:Real, 1},
    cond::AbstractArray{<:Real, 1},
    k::Int, l::Int, m::Int, n::Int; 
    η = 1, τ = 1, 
    estimator = VisitationFrequency(), 
    n_subdivs = 1,
    b = 2)

Conditional TE with default discretization scheme(s)

Calculate transfer entropy from source to response conditioned on cond using the provided estimator on a rectangular discretization of a k + l + m + n-dimensional delay embedding of the input data, using an embedding delay of τ across all embedding components. η is the prediction lag.

Arguments

source: The source data series.
target: The target data series.
cond: The conditional data series.
k: The dimension of the $T_{f}$ component of the embedding.
l: The dimension of the $T_{pp}$ component of the embedding.
m: The dimension of the $S_{pp}$ component of the embedding.
n: The dimension of the $C_{pp}$ component of the embedding.

Keyword arguments

τ: The embedding lag. Default is τ = 1.
η: The prediction lag. Default is η = 1.
estimator: The transfer entropy estimator to use. The default is VisitationFrequency().
n_subdivs: The number of different partitions of varying coarseness to compute TE over, as described below. Default is n_subdivs = 2. (this way, TE is computed over two separate partitions). Unless n_subdivs = 0, make sure that n_subdivs is the same across analyses if they are to be compared. T TE
b: Base of the logarithm. The default (b = 2) gives the TE in bits.

More about the embedding

To compute transfer entropy, we need an appropriate delay embedding of source ( $S$ ), target ( $T$ ) and cond ( $C$ ). For convenience, define

$\begin{align} T_f^{(k)} &= \{(T(t+\eta_k), \ldots, T(t+\eta_2), T(t+\eta_1)) \} \\ T_{pp}^{(l)} &= \{ (T(t), T(t-\tau_1), T(t-\tau_2), \ldots, T(t - \tau_{l - 1})) \} \\ S_{pp}^{(m)} &= \{ (S(t), S(t-\tau_1), S(t-\tau_2), \ldots, S(t-\tau_{m - 1})) \} \\ C_{pp}^{(n)} &= \{ (C(t), C(t-\tau_1), C(t-\tau_2), \ldots, C(t-\tau_{n - 1})) \} \end{align}$

where $T_f$ denotes the k-dimensional set of vectors furnishing the future states of $T$ , $T_{pp}$ denotes the l-dimensional set of vectors furnishing the past and present states of $T$ , , $S_{pp}$ denotes the m-dimensional set of vectors furnishing the past and present of $S$ , and $C_{pp}$ denotes the n-dimensional set of vectors furnishing the past and present of $C$ . $\eta$ is the prediction lag. This convenience function uses $\tau_1$ = τ, $\tau_2$ = 2*τ, $\tau_3$ = 3*τ, and so on.

Combined, we get the generalised embedding $\mathbb{E} = (T_f^{(k)}, T_{pp}^{(l)}, S_{pp}^{(m)}, C_{pp}^{(n)})$ , which is discretized as described below.

More about discretization

To compute TE, we coarse-grain the k+l+m+n-dimensional generalised embedding $\mathbb{E} = (T_f^{(k)}, T_{pp}^{(l)}, S_{pp}^{(m)}, C_{pp}^{(n)})$ into hyperrectangular boxes. The magnitude of the TE may be biased by the particular choice of binning scheme, so we compute TE across a number of different box sizes, determined as follows.

Let $L$ be the number of observations in $S$ (and $T$ and $C$ ). The coarsest box size is determined by subdiving the i-th coordinate axis into $N = ceiling(L^\frac{1}{k + l + m + n + 1})$ intervals of equal lengths, resulting in a box size of $|max(dim_{i}) - min(dim_{i})|/N$ . The next box size is given by $|max(dim_{i}) - min(dim_{i})|/(N+1)$ , then $|max(dim_{i}) - min(dim_{i})|/(N+2)$ , and so on, until the finest box size which is given by $|max(dim_{i}) - min(dim_{i})|/(N+N_{subdivs})$ .

Transfer entropy computation

TE is then computed as

$\begin{align} TE_{S \rightarrow T} = \int_{\mathbb{E}} P(T_f, T_{pp}, S_{pp}, C_{pp}) \log_{b}{\left(\frac{P(T_f | T_{pp}, S_{pp}, C_{pp})}{P(T_f | T_{pp}, C_{pp})}\right)} \end{align}$

using the provided estimator (default = VisitationFrequency) for each of the discretizations. A vector of the TE estimates for each discretization is returned.

source

te_cond(source::AbstractArray{<:Real, 1}, 
    response::AbstractArray{<:Real, 1},
    cond::AbstractArray{<:Real, 1},
    k::Int, l::Int, m::Int, n::Int,
    binning_scheme::Vector{RectangularBinning}; 
    η = 1, τ = 1, 
    estimator = VisitationFrequency(), 
    b = 2)

Conditional TE with default discretization scheme(s

Calculate transfer entropy from source to response conditioned on cond using the provided estimator on a rectangular discretization of a k + l + m + n-dimensional delay embedding of the input data, using an embedding delay of τ across all embedding components. η is the prediction lag.

Arguments

source: The source data series.
target: The target data series.
cond: The conditional data series.
k: The dimension of the $T_{f}$ component of the embedding.
l: The dimension of the $T_{pp}$ component of the embedding.
m: The dimension of the $S_{pp}$ component of the embedding.
n: The dimension of the $C_{pp}$ component of the embedding.
binning_scheme: The binning scheme(s) used to construct the partitions over which TE is computed. Must be either one or several instances of RectangularBinnings (provided as a vector). TE is computed for each of the resulting partitions.

Keyword arguments

τ: The embedding lag. Default is τ = 1.
η: The prediction lag. Default is η = 1.
estimator: The transfer entropy estimator to use. The default is VisitationFrequency().
b: Base of the logarithm. The default (b = 2) gives the TE in bits.

More about the embedding

To compute transfer entropy, we need an appropriate delay embedding of source ( $S$ ), target ( $T$ ) and cond ( $C$ ). For convenience, define

$\begin{align} T_f^{(k)} &= \{(T(t+\eta_k), \ldots, T(t+\eta_2), T(t+\eta_1)) \} \\ T_{pp}^{(l)} &= \{ (T(t), T(t-\tau_1), T(t-\tau_2), \ldots, T(t - \tau_{l - 1})) \} \\ S_{pp}^{(m)} &= \{ (S(t), S(t-\tau_1), S(t-\tau_2), \ldots, S(t-\tau_{m - 1})) \} \\ C_{pp}^{(n)} &= \{ (C(t), C(t-\tau_1), C(t-\tau_2), \ldots, C(t-\tau_{n - 1})) \} \end{align}$

where $T_f$ denotes the k-dimensional set of vectors furnishing the future states of $T$ , $T_{pp}$ denotes the l-dimensional set of vectors furnishing the past and present states of $T$ , , $S_{pp}$ denotes the m-dimensional set of vectors furnishing the past and present of $S$ , and $C_{pp}$ denotes the n-dimensional set of vectors furnishing the past and present of $C$ . $\eta$ is the prediction lag. This convenience function uses $\tau_1$ = τ, $\tau_2$ = 2*τ, $\tau_3$ = 3*τ, and so on.

Combined, we get the generalised embedding $\mathbb{E} = (T_f^{(k)}, T_{pp}^{(l)}, S_{pp}^{(m)}, C_{pp}^{(n)})$ , which is discretized as described below.

More about discretization

The discretization scheme must be either a single RectangularBinning instance, or a vector of RectangularBinning instances. Run ?RectangularBinning after loading CausalityTools for details.

Transfer entropy computation

TE is then computed as

$\begin{align} TE_{S \rightarrow T} = \int_{\mathbb{E}} P(T_f, T_{pp}, S_{pp}, C_{pp}) \log_{b}{\left(\frac{P(T_f | T_{pp}, S_{pp}, C_{pp})}{P(T_f | T_{pp}, C_{pp})}\right)} \end{align}$

using the provided estimator (default = VisitationFrequency) for each of the discretizations. A vector of the TE estimates for each discretization is returned.

source