Associations
Association API
The most basic components of Associations.jl are a collection of statistics that in some manner quantify the "association" between input datasets. Precisely what is meant by "association" depends on the measure, and precisely what is meant by "quantify" depends on the estimator of that measure. We formalize this notion below with the association
function, which dispatches on AssociationMeasureEstimator
and AssociationMeasure
.
Associations.association
— Functionassociation(estimator::AssociationMeasureEstimator, x, y, [z, ...]) → r
association(definition::AssociationMeasure, x, y, [z, ...]) → r
Estimate the (conditional) association between input variables x, y, z, …
using the given estimator
(an AssociationMeasureEstimator
) or definition
(an AssociationMeasure
).
The type of the return value r
depends on the measure
/estimator
. The interpretation of the returned value also depends on the specific measure and estimator used.
Examples
The examples section of the online documentation has numerous using association
.
Associations.AssociationMeasure
— TypeAssociationMeasure
The supertype of all association measures.
Abstract implementations
Currently, the association measures are classified by abstract classes listed below. These abstract classes offer common functionality among association measures that are conceptually similar. This makes maintenance and framework extension easier than if each measure was implemented "in isolation".
Concrete implementations
Concrete subtypes are given as input to association
. Many of these types require an AssociationMeasureEstimator
to compute.
Type | AssociationMeasure | Pairwise | Conditional |
---|---|---|---|
Correlation | PearsonCorrelation | ✓ | ✖ |
Correlation | PartialCorrelation | ✓ | ✓ |
Correlation | DistanceCorrelation | ✓ | ✓ |
Correlation | ChatterjeeCorrelation | ✓ | ✖ |
Correlation | AzadkiaChatterjeeCoefficient | ✓ | ✓ |
Closeness | SMeasure | ✓ | ✖ |
Closeness | HMeasure | ✓ | ✖ |
Closeness | MMeasure | ✓ | ✖ |
Closeness (ranks) | LMeasure | ✓ | ✖ |
Closeness | JointDistanceDistribution | ✓ | ✖ |
Cross-mapping | PairwiseAsymmetricInference | ✓ | ✖ |
Cross-mapping | ConvergentCrossMapping | ✓ | ✖ |
Conditional recurrence | MCR | ✓ | ✖ |
Conditional recurrence | RMCD | ✓ | ✓ |
Shared information | MIShannon | ✓ | ✖ |
Shared information | MIRenyiJizba | ✓ | ✖ |
Shared information | MIRenyiSarbu | ✓ | ✖ |
Shared information | MITsallisFuruichi | ✓ | ✖ |
Shared information | PartialCorrelation | ✖ | ✓ |
Shared information | CMIShannon | ✖ | ✓ |
Shared information | CMIRenyiSarbu | ✖ | ✓ |
Shared information | CMIRenyiJizba | ✖ | ✓ |
Shared information | CMIRenyiPoczos | ✖ | ✓ |
Shared information | CMITsallisPapapetrou | ✖ | ✓ |
Shared information | ShortExpansionConditionalMutualInformation | ✖ | ✓ |
Information transfer | TEShannon | ✓ | ✓ |
Information transfer | TERenyiJizba | ✓ | ✓ |
Partial mutual information | PartialMutualInformation | ✖ | ✓ |
Information measure | JointEntropyShannon | ✓ | ✖ |
Information measure | JointEntropyRenyi | ✓ | ✖ |
Information measure | JointEntropyTsallis | ✓ | ✖ |
Information measure | ConditionalEntropyShannon | ✓ | ✖ |
Information measure | ConditionalEntropyTsallisAbe | ✓ | ✖ |
Information measure | ConditionalEntropyTsallisFuruichi | ✓ | ✖ |
Divergence | HellingerDistance | ✓ | ✖ |
Divergence | KLDivergence | ✓ | ✖ |
Divergence | RenyiDivergence | ✓ | ✖ |
Divergence | VariationDistance | ✓ | ✖ |
Associations.AssociationMeasureEstimator
— TypeAssociationMeasureEstimator
The supertype of all association measure estimators.
Concrete subtypes are given as input to association
.
Abstract subtypes
Concrete implementations
Here are some examples of how to use association
.
julia> using Associations
julia> x, y, z = rand(1000), rand(1000), rand(1000);
julia> association(LMeasure(), x, y)
-0.00847333277052152
julia> association(DistanceCorrelation(), x, y)
0.03144088193552327
julia> association(JointProbabilities(JointEntropyShannon(), CodifyVariables(Dispersion(c = 3, m = 2))), x, y)
3.1300103114286615
julia> association(EntropyDecomposition(MIShannon(), PlugIn(Shannon()), CodifyVariables(OrdinalPatterns(m=3))), x, y)
0.010614076845458342
julia> association(KSG2(MIShannon(base = 2)), x, y)
-0.184200649725621
julia> association(JointProbabilities(PartialMutualInformation(), CodifyVariables(OrdinalPatterns(m=3))), x, y, z)
0.24183294965070323
julia> association(FPVP(CMIShannon(base = 2)), x, y, z)
-0.33991478474691844
Information measures
Associations.MultivariateInformationMeasure
— TypeMultivariateInformationMeasure <: AssociationMeasure
The supertype for all multivariate information-based measure definitions.
Definition
Following Datseris and Haaga (2024), we define a multivariate information measure as any functional of a multidimensional probability mass functions (PMFs) or multidimensional probability density.
Implementations
JointEntropy
definitions:
ConditionalEntropy
definitions:
DivergenceOrDistance
definitions:
MutualInformation
definitions:
ConditionalMutualInformation
definitions:
TransferEntropy
definitions:
Other definitions:
Conditional entropies
Associations.ConditionalEntropy
— TypeConditionalEntropy <: MultivariateInformationMeasure
The supertype for all conditional entropy measures.
Concrete subtypes
Associations.ConditionalEntropyShannon
— TypeConditionalEntropyShannon <: ConditionalEntropy
ConditionalEntropyShannon(; base = 2)
The Shannon
conditional entropy measure.
Usage
- Use with
association
to compute the Shannon conditional entropy between two variables.
Compatible estimators
Discrete definition
Sum formulation
The conditional entropy between discrete random variables $X$ and $Y$ with finite ranges $\mathcal{X}$ and $\mathcal{Y}$ is defined as
\[H^{S}(X | Y) = -\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log(p(x | y)).\]
This is the definition used when calling association
with a JointProbabilities
estimator.
Two-entropies formulation
Equivalently, the following differenConditionalEntropy of entropies hold
\[H^S(X | Y) = H^S(X, Y) - H^S(Y),\]
where $H^S(\cdot)$ and $H^S(\cdot | \cdot)$ are the Shannon
entropy and Shannon joint entropy, respectively. This is the definition used when calling association
with a ProbabilitiesEstimator
.
Differential definition
The differential conditional Shannon entropy is analogously defined as
\[H^S(X | Y) = h^S(X, Y) - h^S(Y),\]
where $h^S(\cdot)$ and $h^S(\cdot | \cdot)$ are the Shannon
differential entropy and Shannon joint differential entropy, respectively. This is the definition used when calling association
with a DifferentialInfoEstimator
.
Estimation
- Example 1: Analytical example from Cover & Thomas's book.
- Example 2:
JointProbabilities
estimator withCodifyVariables
discretization andUniqueElements
outcome space on categorical data. - Example 3:
JointProbabilities
estimator withCodifyPoints
discretization andUniqueElementsEncoding
encoding of points on numerical data.
Associations.ConditionalEntropyTsallisFuruichi
— TypeConditionalEntropyTsallisFuruichi <: ConditionalEntropy
ConditionalEntropyTsallisFuruichi(; base = 2, q = 1.5)
Furuichi (2006)'s discrete Tsallis conditional entropy definition.
Usage
- Use with
association
to compute the Tsallis-Furuichi conditional entropy between two variables.
Compatible estimators
Definition
Furuichi's Tsallis conditional entropy between discrete random variables $X$ and $Y$ with finite ranges $\mathcal{X}$ and $\mathcal{Y}$ is defined as
\[H_q^T(X | Y) = -\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} p(x, y)^q \log_q(p(x | y)),\]
$\ln_q(x) = \frac{x^{1-q} - 1}{1 - q}$ and $q \neq 1$. For $q = 1$, $H_q^T(X | Y)$ reduces to the Shannon conditional entropy:
\[H_{q=1}^T(X | Y) = -\sum_{x \in \mathcal{X}, y \in \mathcal{Y}} = p(x, y) \log(p(x | y))\]
If any of the entries of the marginal distribution for Y
are zero, or the q-logarithm is undefined for a particular value, then the measure is undefined and NaN
is returned.
Estimation
- Example 1:
JointProbabilities
estimator withCodifyVariables
discretization andUniqueElements
outcome space on categorical data. - Example 2:
JointProbabilities
estimator withCodifyPoints
discretization andUniqueElementsEncoding
encoding of points on numerical data.
Associations.ConditionalEntropyTsallisAbe
— TypeConditionalEntropyTsallisAbe <: ConditionalEntropy
ConditionalEntropyTsallisAbe(; base = 2, q = 1.5)
Abe and Rajagopal (2001)'s discrete Tsallis conditional entropy measure.
Usage
- Use with
association
to compute the Tsallis-Abe conditional entropy between two variables.
Compatible estimators
Definition
Abe & Rajagopal's Tsallis conditional entropy between discrete random variables $X$ and $Y$ with finite ranges $\mathcal{X}$ and $\mathcal{Y}$ is defined as
\[H_q^{T_A}(X | Y) = \dfrac{H_q^T(X, Y) - H_q^T(Y)}{1 + (1-q)H_q^T(Y)},\]
where $H_q^T(\cdot)$ and $H_q^T(\cdot, \cdot)$ is the Tsallis
entropy and the joint Tsallis entropy.
Estimation
- Example 1:
JointProbabilities
estimator withCodifyVariables
discretization andUniqueElements
outcome space on categorical data. - Example 2:
JointProbabilities
estimator withCodifyPoints
discretization andUniqueElementsEncoding
encoding of points on numerical data.
Divergences and distances
Associations.DivergenceOrDistance
— TypeDivergenceOrDistance <: BivariateInformationMeasure
The supertype for bivariate information measures aiming to quantify some sort of divergence, distance or closeness between two probability distributions.
Some of these measures are proper metrics, while others are not, but they have in common that they aim to quantify how "far from each other" two probabilities distributions are.
Concrete implementations
Associations.HellingerDistance
— TypeHellingerDistance <: DivergenceOrDistance
The Hellinger distance.
Usage
- Use with
association
to compute the compute the Hellinger distance between two pre-computed probability distributions, or from raw data using one of the estimators listed below.
Compatible estimators
Description
The Hellinger distance between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace
$\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as
\[D_{H}(P_Y(\Omega) || P_Y(\Omega)) = \dfrac{1}{\sqrt{2}} \sum_{\omega \in \Omega} (\sqrt{p_x(\omega)} - \sqrt{p_y(\omega)})^2\]
Estimation
- Example 1: From precomputed probabilities
- Example 2:
JointProbabilities
withOrdinalPatterns
outcome space
Associations.KLDivergence
— TypeKLDivergence <: DivergenceOrDistance
The Kullback-Leibler (KL) divergence.
Usage
- Use with
association
to compute the compute the KL-divergence between two pre-computed probability distributions, or from raw data using one of the estimators listed below.
Compatible estimators
Estimators
Description
The KL-divergence between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace
$\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as
\[D_{KL}(P_Y(\Omega) || P_Y(\Omega)) = \sum_{\omega \in \Omega} p_x(\omega) \log\dfrac{p_x(\omega)}{p_y(\omega)}\]
Implements
association
. Used to compute the KL-divergence between two pre-computed probability distributions. If used withRelativeAmount
, the KL divergence may be undefined to due some outcomes having zero counts. Use some otherProbabilitiesEstimator
likeBayesianRegularization
to ensure all estimated probabilities are nonzero.
Distances.jl also defines KLDivergence
. Quality it if you're loading both packages, i.e. do association(Associations.KLDivergence(), x, y)
.
Estimation
- Example 1: From precomputed probabilities
- Example 2:
JointProbabilities
withOrdinalPatterns
outcome space
Associations.RenyiDivergence
— TypeRenyiDivergence <: DivergenceOrDistance
RenyiDivergence(q; base = 2)
The Rényi divergence of positive order q
.
Usage
- Use with
association
to compute the compute the Rényi divergence between two pre-computed probability distributions, or from raw data using one of the estimators listed below.
Compatible estimators
Description
The Rényi divergence between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace
$\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as van Erven and Harremos (2014).
\[D_{q}(P_Y(\Omega) || P_Y(\Omega)) = \dfrac{1}{q - 1} \log \sum_{\omega \in \Omega}p_x(\omega)^{q}p_y(\omega)^{1-\alpha}\]
Implements
information
. Used to compute the Rényi divergence between two pre-computed probability distributions. If used withRelativeAmount
, the KL divergence may be undefined to due some outcomes having zero counts. Use some otherProbabilitiesEstimator
likeBayesianRegularization
to ensure all estimated probabilities are nonzero.
Distances.jl also defines RenyiDivergence
. Quality it if you're loading both packages, i.e. do association(Associations.RenyiDivergence(), x, y)
.
Estimation
- Example 1: From precomputed probabilities
- Example 2:
JointProbabilities
withOrdinalPatterns
outcome space
Associations.VariationDistance
— TypeVariationDistance <: DivergenceOrDistance
The variation distance.
Usage
- Use with
association
to compute the compute the variation distance between two pre-computed probability distributions, or from raw data using one of the estimators listed below.
Compatible estimators
Description
The variation distance between two probability distributions $P_X = (p_x(\omega_1), \ldots, p_x(\omega_n))$ and $P_Y = (p_y(\omega_1), \ldots, p_y(\omega_m))$, both defined over the same OutcomeSpace
$\Omega = \{\omega_1, \ldots, \omega_n \}$, is defined as
\[D_{V}(P_Y(\Omega) || P_Y(\Omega)) = \dfrac{1}{2} \sum_{\omega \in \Omega} | p_x(\omega) - p_y(\omega) |\]
Examples
- Example 1: From precomputed probabilities
- Example 2:
JointProbabilities
withOrdinalPatterns
outcome space
Joint entropies
Associations.JointEntropy
— TypeJointEntropy <: BivariateInformationMeasure
The supertype for all joint entropy measures.
Concrete implementations
Associations.JointEntropyShannon
— TypeJointEntropyShannon <: JointEntropy
JointEntropyShannon(; base = 2)
The Shannon joint entropy measure (Cover, 1999).
Usage
- Use with
association
to compute the Shannon joint entropy between two variables.
Compatible estimators
Definition
Given two two discrete random variables $X$ and $Y$ with ranges $\mathcal{X}$ and $\mathcal{X}$, Cover (1999) defines the Shannon joint entropy as
\[H^S(X, Y) = -\sum_{x\in \mathcal{X}, y \in \mathcal{Y}} p(x, y) \log p(x, y),\]
where we define $log(p(x, y)) := 0$ if $p(x, y) = 0$.
Estimation
- Example 1:
JointProbabilities
withDispersion
outcome space
Associations.JointEntropyTsallis
— TypeJointEntropyTsallis <: JointEntropy
JointEntropyTsallis(; base = 2, q = 1.5)
The Tsallis joint entropy definition from Furuichi (2006).
Usage
- Use with
association
to compute the Furuichi-Tsallis joint entropy between two variables.
Compatible estimators
Definition
Given two two discrete random variables $X$ and $Y$ with ranges $\mathcal{X}$ and $\mathcal{X}$, Furuichi (2006) defines the Tsallis joint entropy as
\[H_q^T(X, Y) = -\sum_{x\in \mathcal{X}, y \in \mathcal{Y}} p(x, y)^q \log_q p(x, y),\]
where $log_q(x, q) = \dfrac{x^{1-q} - 1}{1-q}$ is the q-logarithm, and we define $log_q(x, q) := 0$ if $q = 0$.
Estimation
- Example 1:
JointProbabilities
withOrdinalPatterns
outcome space
Associations.JointEntropyRenyi
— TypeJointEntropyRenyi <: JointEntropy
JointEntropyRenyi(; base = 2, q = 1.5)
The Rényi joint entropy measure (Golshani et al., 2009).
Usage
- Use with
association
to compute the Golshani-Rényi joint entropy between two variables.
Compatible estimators
Definition
Given two two discrete random variables $X$ and $Y$ with ranges $\mathcal{X}$ and $\mathcal{X}$, Golshani et al. (2009) defines the Rényi joint entropy as
\[H_q^R(X, Y) = \dfrac{1}{1-\alpha} \log \sum_{i = 1}^N p_i^q,\]
where $q > 0$ and $q != 1$.
Estimation
- Example 1:
JointProbabilities
withValueBinning
outcome space
Mutual informations
Associations.MutualInformation
— TypeMutualInformation
Abstract type for all mutual information measures.
Concrete implementations
See also: MutualInformationEstimator
Associations.MIShannon
— TypeMIShannon <: BivariateInformationMeasure
MIShannon(; base = 2)
The Shannon mutual information $I_S(X; Y)$.
Usage
- Use with
association
to compute the raw Shannon mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Shannon mutual information.
Compatible estimators
JointProbabilities
(generic)EntropyDecomposition
(generic)KraskovStögbauerGrassberger1
KraskovStögbauerGrassberger2
GaoOhViswanath
GaoKannanOhViswanath
GaussianMI
Discrete definition
There are many equivalent formulations of discrete Shannon mutual information, meaning that it can be estimated in several ways, either using JointProbabilities
(double-sum formulation), EntropyDecomposition
(three-entropies decomposition), or some dedicated estimator.
Double sum formulation
Assume we observe samples $\bar{\bf{X}}_{1:N_y} = \{\bar{\bf{X}}_1, \ldots, \bar{\bf{X}}_n \}$ and $\bar{\bf{Y}}_{1:N_x} = \{\bar{\bf{Y}}_1, \ldots, \bar{\bf{Y}}_n \}$ from two discrete random variables $X$ and $Y$ with finite supports $\mathcal{X} = \{ x_1, x_2, \ldots, x_{M_x} \}$ and $\mathcal{Y} = y_1, y_2, \ldots, x_{M_y}$. The double-sum estimate is obtained by replacing the double sum
\[\hat{I}_{DS}(X; Y) = \sum_{x_i \in \mathcal{X}, y_i \in \mathcal{Y}} p(x_i, y_j) \log \left( \dfrac{p(x_i, y_i)}{p(x_i)p(y_j)} \right)\]
where $\hat{p}(x_i) = \frac{n(x_i)}{N_x}$, $\hat{p}(y_i) = \frac{n(y_j)}{N_y}$, and $\hat{p}(x_i, x_j) = \frac{n(x_i)}{N}$, and $N = N_x N_y$. This definition is used by association
when called with a JointProbabilities
estimator.
Three-entropies formulation
An equivalent formulation of discrete Shannon mutual information is
\[I^S(X; Y) = H^S(X) + H_q^S(Y) - H^S(X, Y),\]
where $H^S(\cdot)$ and $H^S(\cdot, \cdot)$ are the marginal and joint discrete Shannon entropies. This definition is used by association
when called with a EntropyDecomposition
estimator and a discretization.
Differential mutual information
One possible formulation of differential Shannon mutual information is
\[I^S(X; Y) = h^S(X) + h_q^S(Y) - h^S(X, Y),\]
where $h^S(\cdot)$ and $h^S(\cdot, \cdot)$ are the marginal and joint differential Shannon entropies. This definition is used by association
when called with EntropyDecomposition
estimator and a DifferentialInfoEstimator
.
Estimation
- Example 1:
JointProbabilities
withValueBinning
outcome space. - Example 2:
JointProbabilities
withUniqueElements
outcome space on string data. - Example 3: Dedicated
GaussianMI
estimator. - Example 4: Dedicated
KraskovStögbauerGrassberger1
estimator. - Example 5: Dedicated
KraskovStögbauerGrassberger2
estimator. - Example 6: Dedicated
GaoKannanOhViswanath
estimator. - Example 7:
EntropyDecomposition
withKraskov
estimator. - Example 8:
EntropyDecomposition
withBubbleSortSwaps
. - Example 9:
EntropyDecomposition
withJackknife
estimator andValueBinning
outcome space. - Example 10: Reproducing Kraskov et al. (2004).
Associations.MITsallisFuruichi
— TypeMITsallisFuruichi <: BivariateInformationMeasure
MITsallisFuruichi(; base = 2, q = 1.5)
The discrete Tsallis mutual information from Furuichi (2006)(Furuichi, 2006), which in that paper is called the mutual entropy.
Usage
- Use with
association
to compute the raw Tsallis-Furuichi mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Tsallis-Furuichi mutual information.
Compatible estimators
Description
Furuichi's Tsallis mutual entropy between variables $X \in \mathbb{R}^{d_X}$ and $Y \in \mathbb{R}^{d_Y}$ is defined as
\[I_q^T(X; Y) = H_q^T(X) - H_q^T(X | Y) = H_q^T(X) + H_q^T(Y) - H_q^T(X, Y),\]
where $H^T(\cdot)$ and $H^T(\cdot, \cdot)$ are the marginal and joint Tsallis entropies, and q
is the Tsallis
-parameter.
Estimation
- Example 1:
JointProbabilities
withUniqueElements
outcome space. - Example 2:
EntropyDecomposition
withLeonenkoProzantoSavani
estimator. - Example 3:
EntropyDecomposition
withDispersion
Associations.MITsallisMartin
— TypeMITsallisMartin <: BivariateInformationMeasure
MITsallisMartin(; base = 2, q = 1.5)
The discrete Tsallis mutual information from Martin et al. (2004).
Usage
- Use with
association
to compute the raw Tsallis-Martin mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Tsallis-Martin mutual information.
Compatible estimators
Description
Martin et al.'s Tsallis mutual information between variables $X \in \mathbb{R}^{d_X}$ and $Y \in \mathbb{R}^{d_Y}$ is defined as
\[I_{\text{Martin}}^T(X, Y, q) := H_q^T(X) + H_q^T(Y) - (1 - q) H_q^T(X) H_q^T(Y) - H_q(X, Y),\]
where $H^S(\cdot)$ and $H^S(\cdot, \cdot)$ are the marginal and joint Shannon entropies, and q
is the Tsallis
-parameter.
Estimation
- Example 1:
JointProbabilities
withUniqueElements
outcome space. - Example 2:
EntropyDecomposition
withLeonenkoProzantoSavani
estimator. - Example 3:
EntropyDecomposition
withOrdinalPatterns
outcome space.
Associations.MIRenyiJizba
— TypeMIRenyiJizba <: <: BivariateInformationMeasure
MIRenyiJizba(; q = 1.5, base = 2)
The Rényi mutual information $I_q^{R_{J}}(X; Y)$ defined in (Jizba et al., 2012).
Usage
- Use with
association
to compute the raw Rényi-Jizba mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Rényi-Jizba mutual information.
Compatible estimators
Definition
\[I_q^{R_{J}}(X; Y) = H_q^{R}(X) + H_q^{R}(Y) - H_q^{R}(X, Y),\]
where $H_q^{R}(\cdot)$ is the Rényi
entropy.
Estimation
- Example 1:
JointProbabilities
withUniqueElements
outcome space. - Example 2:
EntropyDecomposition
withLeonenkoProzantoSavani
. - Example 3:
EntropyDecomposition
withValueBinning
.
Associations.MIRenyiSarbu
— TypeMIRenyiSarbu <: BivariateInformationMeasure
MIRenyiSarbu(; base = 2, q = 1.5)
The discrete Rényi mutual information from Sarbu (2014).
Usage
- Use with
association
to compute the raw Rényi-Sarbu mutual information from input data using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Rényi-Sarbu mutual information.
Compatible estimators
Description
Sarbu (2014) defines discrete Rényi mutual information as the Rényi $\alpha$-divergence between the conditional joint probability mass function $p(x, y)$ and the product of the conditional marginals, $p(x) \cdot p(y)$:
\[I(X, Y)^R_q = \dfrac{1}{q-1} \log \left( \sum_{x \in X, y \in Y} \dfrac{p(x, y)^q}{\left( p(x)\cdot p(y) \right)^{q-1}} \right)\]
Estimation
- Example 1:
JointProbabilities
withUniqueElements
for categorical data. - Example 2:
JointProbabilities
withCosineSimilarityBinning
for numerical data.
Conditional mutual informations
Associations.ConditionalMutualInformation
— TypeCondiitionalMutualInformation
Abstract type for all mutual information measures.
Concrete implementations
See also: ConditionalMutualInformationEstimator
Associations.CMIShannon
— TypeCMIShannon <: ConditionalMutualInformation
CMIShannon(; base = 2)
The Shannon conditional mutual information (CMI) $I^S(X; Y | Z)$.
Usage
- Use with
association
to compute the raw Shannon conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Shannon conditional mutual information.
Compatible estimators
JointProbabilities
EntropyDecomposition
MIDecomposition
FPVP
MesnerShalizi
Rahimzamani
PoczosSchneiderCMI
GaussianCMI
Supported definitions
Consider random variables $X \in \mathbb{R}^{d_X}$ and $Y \in \mathbb{R}^{d_Y}$, given $Z \in \mathbb{R}^{d_Z}$. The Shannon conditional mutual information is defined as
\[\begin{align*} I(X; Y | Z) &= H^S(X, Z) + H^S(Y, z) - H^S(X, Y, Z) - H^S(Z) \\ &= I^S(X; Y, Z) + I^S(X; Y) \end{align*},\]
where $I^S(\cdot; \cdot)$ is the Shannon mutual information MIShannon
, and $H^S(\cdot)$ is the Shannon
entropy.
Differential Shannon CMI is obtained by replacing the entropies by differential entropies.
Estimation
- Example 1:
EntropyDecomposition
withKraskov
estimator. - Example 2:
EntropyDecomposition
withValueBinning
estimator. - Example 3:
MIDecomposition
withKraskovStögbauerGrassberger1
estimator.
Associations.CMIRenyiSarbu
— TypeCMIRenyiSarbu <: ConditionalMutualInformation
CMIRenyiSarbu(; base = 2, q = 1.5)
The Rényi conditional mutual information from Sarbu (2014).
Usage
- Use with
association
to compute the raw Rényi-Sarbu conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Rényi-Sarbu conditional mutual information.
Compatible estimators
Discrete description
Assume we observe three discrete random variables $X$, $Y$ and $Z$. Sarbu (2014) defines discrete conditional Rényi mutual information as the conditional Rényi $\alpha$-divergence between the conditional joint probability mass function $p(x, y | z)$ and the product of the conditional marginals, $p(x |z) \cdot p(y|z)$:
\[I(X, Y; Z)^R_q = \dfrac{1}{q-1} \sum_{z \in Z} p(Z = z) \log \left( \sum_{x \in X}\sum_{y \in Y} \dfrac{p(x, y|z)^q}{\left( p(x|z)\cdot p(y|z) \right)^{q-1}} \right)\]
Associations.CMIRenyiJizba
— TypeCMIRenyiJizba <: ConditionalMutualInformation
CMIRenyiJizba(; base = 2, q = 1.5)
The Rényi conditional mutual information $I_q^{R_{J}}(X; Y | Z)$ defined in Jizba et al. (2012).
Usage
- Use with
association
to compute the raw Rényi-Jizba conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Rényi-Jizba conditional mutual information.
Compatible estimators
Definition
\[I_q^{R_{J}}(X; Y | Z) = I_q^{R_{J}}(X; Y, Z) - I_q^{R_{J}}(X; Z),\]
where $I_q^{R_{J}}(X; Z)$ is the MIRenyiJizba
mutual information.
Estimation
- Example 1:
JointProbabilities
withBubbleSortSwaps
outcome space. - Example 2:
EntropyDecomposition
withOrdinalPatterns
outcome space. - Example 3:
EntropyDecomposition
with differential entropy estimatorLeonenkoProzantoSavani
.
Associations.CMIRenyiPoczos
— TypeCMIRenyiPoczos <: ConditionalMutualInformation
CMIRenyiPoczos(; base = 2, q = 1.5)
The differential Rényi conditional mutual information $I_q^{R_{P}}(X; Y | Z)$ defined in Póczos and Schneider (2012).
Usage
- Use with
association
to compute the raw Rényi-Poczos conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Rényi-Poczos conditional mutual information.
Compatible estimators
Definition
\[\begin{align*} I_q^{R_{P}}(X; Y | Z) &= \dfrac{1}{q-1} \int \int \int \dfrac{p_Z(z) p_{X, Y | Z}^q}{( p_{X|Z}(x|z) p_{Y|Z}(y|z) )^{q-1}} \\ &= \mathbb{E}_{X, Y, Z} \sim p_{X, Y, Z} \left[ \dfrac{p_{X, Z}^{1-q}(X, Z) p_{Y, Z}^{1-q}(Y, Z) }{p_{X, Y, Z}^{1-q}(X, Y, Z) p_Z^{1-q}(Z)} \right] \end{align*}\]
Estimation
- Example 1: Dedicated
PoczosSchneiderCMI
estimator.
Associations.CMITsallisPapapetrou
— TypeCMITsallisPapapetrou <: ConditionalMutualInformation
CMITsallisPapapetrou(; base = 2, q = 1.5)
The Tsallis-Papapetrou conditional mutual information (Papapetrou and Kugiumtzis, 2020).
Usage
- Use with
association
to compute the raw Tsallis-Papapetrou conditional mutual information using of of the estimators listed below. - Use with
independence
to perform a formal hypothesis test for pairwise conditional independence using the Tsallis-Papapetrou conditional mutual information.
Compatible estimators
Definition
Tsallis-Papapetrou conditional mutual information is defined as
\[I_T^q(X, Y \mid Z) = \frac{1}{1 - q} \left( 1 - \sum_{XYZ} \frac{p(x, y, z)^q}{p(x \mid z)^{q-1} p(y \mid z)^{q-1} p(z)^{q-1}} \right).\]
Transfer entropy
Associations.TransferEntropy
— TypeTransferEntropy <: AssociationMeasure
The supertype of all transfer entropy measures. Concrete subtypes are
Associations.TEShannon
— TypeTEShannon <: TransferEntropy
TEShannon(; base = 2; embedding = EmbeddingTE()) <: TransferEntropy
The Shannon-type transfer entropy measure.
Usage
- Use with
association
to compute the raw transfer entropy. - Use with an
IndependenceTest
to perform a formal hypothesis test for pairwise and conditional dependence.
Description
The transfer entropy from source $S$ to target $T$, potentially conditioned on $C$ is defined as
\[\begin{align*} TE(S \to T) &:= I^S(T^+; S^- | T^-) \\ TE(S \to T | C) &:= I^S(T^+; S^- | T^-, C^-) \end{align*}\]
where $I(T^+; S^- | T^-)$ is the Shannon conditional mutual information (CMIShannon
). The -
and +
subscripts on the marginal variables $T^+$, $T^-$, $S^-$ and $C^-$ indicate that the embedding vectors for that marginal are constructed using present/past values and future values, respectively.
Estimation
- Example 1:
EntropyDecomposition
withTransferOperator
outcome space. - Example 2: Estimation using the
SymbolicTransferEntropy
estimator.
Associations.TERenyiJizba
— TypeTERenyiJizba() <: TransferEntropy
The Rényi transfer entropy from Jizba et al. (2012).
Usage
- Use with
association
to compute the raw transfer entropy. - Use with an
IndependenceTest
to perform a formal hypothesis test for pairwise and conditional dependence.
Description
The transfer entropy from source $S$ to target $T$, potentially conditioned on $C$ is defined as
\[\begin{align*} TE(S \to T) &:= I_q^{R_J}(T^+; S^- | T^-) \\ TE(S \to T | C) &:= I_q^{R_J}(T^+; S^- | T^-, C^-), \end{align*},\]
where $I_q^{R_J}(T^+; S^- | T^-)$ is Jizba et al. (2012)'s definition of conditional mutual information (CMIRenyiJizba
). The -
and +
subscripts on the marginal variables $T^+$, $T^-$, $S^-$ and $C^-$ indicate that the embedding vectors for that marginal are constructed using present/past values and future values, respectively.
Estimation
Estimating Jizba's Rényi transfer entropy is a bit complicated, since it doesn't have a dedicated estimator. Instead, we re-write the Rényi transfer entropy as a Rényi conditional mutual information, and estimate it using an EntropyDecomposition
with a suitable discrete/differential Rényi entropy estimator from the list below as its input.
Estimator | Sub-estimator | Principle |
---|---|---|
EntropyDecomposition | LeonenkoProzantoSavani | Four-entropies decomposition |
EntropyDecomposition | ValueBinning | Four-entropies decomposition |
EntropyDecomposition | Dispersion | Four-entropies decomposition |
EntropyDecomposition | OrdinalPatterns | Four-entropies decomposition |
EntropyDecomposition | UniqueElements | Four-entropies decomposition |
EntropyDecomposition | TransferOperator | Four-entropies decomposition |
Any of these estimators must be given as input to a `CMIDecomposition estimator.
Estimation
- Example 1:
EntropyDecomposition
withTransferOperator
outcome space.
The following utility functions and types are also useful for transfer entropy estimation.
Associations.optimize_marginals_te
— Functionoptimize_marginals_te([scheme = OptimiseTraditional()], s, t, [c]) → EmbeddingTE
Optimize marginal embeddings for transfer entropy computation from source time series s
to target time series t
, conditioned on c
if c
is given, using the provided optimization scheme
.
Associations.EmbeddingTE
— TypeEmbeddingTE(; dS = 1, dT = 1, dTf = 1, dC = 1, τS = -1, τT = -1, ηTf = 1, τC = -1)
EmbeddingTE(opt::OptimiseTraditional, s, t, [c])
EmbeddingTE
provide embedding parameters for transfer entropy analysis using either TEShannon
, TERenyiJizba
, or in general any subtype of TransferEntropy
.
The second method finds parameters using the "traditional" optimised embedding techniques from DynamicalSystems.jl
Convention for generalized delay reconstruction
We use the following convention. Let $s(i)$ be time series for the source variable, $t(i)$ be the time series for the target variable and $c(i)$ the time series for the conditional variable. To compute transfer entropy, we need the following marginals:
\[\begin{aligned} T^{+} &= \{t(i+\eta^1), t(i+\eta^2), \ldots, (t(i+\eta^{d_{T^{+}}}) \} \\ T^{-} &= \{ (t(i+\tau^0_{T}), t(i+\tau^1_{T}), t(i+\tau^2_{T}), \ldots, t(t + \tau^{d_{T} - 1}_{T})) \} \\ S^{-} &= \{ (s(i+\tau^0_{S}), s(i+\tau^1_{S}), s(i+\tau^2_{S}), \ldots, s(t + \tau^{d_{S} - 1}_{S})) \} \\ C^{-} &= \{ (c(i+\tau^0_{C}), c(i+\tau^1_{C}), c(i+\tau^2_{C}), \ldots, c(t + \tau^{d_{C} - 1}_{C})) \} \end{aligned}\]
Depending on the application, the delay reconstruction lags $\tau^k_{T} \leq 0$, $\tau^k_{S} \leq 0$, and $\tau^k_{C} \leq 0$ may be equally spaced, or non-equally spaced. The same applied to the prediction lag(s), but typically only a only a single predictions lag $\eta^k$ is used (so that $d_{T^{+}} = 1$).
For transfer entropy, traditionally at least one $\tau^k_{T}$, one $\tau^k_{S}$ and one $\tau^k_{C}$ equals zero. This way, the $T^{-}$, $S^{-}$ and $C^{-}$ marginals always contains present/past states, while the $\mathcal T$ marginal contain future states relative to the other marginals. However, this is not a strict requirement, and modern approaches that searches for optimal embeddings can return embeddings without the intantaneous lag.
Combined, we get the generalized delay reconstruction $\mathbb{E} = (T^{+}_{(d_{T^{+}})}, T^{-}_{(d_{T})}, S^{-}_{(d_{S})}, C^{-}_{(d_{C})})$. Transfer entropy is then computed as
\[\begin{aligned} TE_{S \rightarrow T | C} = \int_{\mathbb{E}} P(T^{+}, T^-, S^-, C^-) \log_{b}{\left(\frac{P(T^{+} | T^-, S^-, C^-)}{P(T^{+} | T^-, C^-)}\right)}, \end{aligned}\]
or, if conditionals are not relevant,
\[\begin{aligned} TE_{S \rightarrow T} = \int_{\mathbb{E}} P(T^{+}, T^-, S^-) \log_{b}{\left(\frac{P(T^{+} | T^-, S^-)}{P(T^{+} | T^-)}\right)}, \end{aligned}\]
Here,
- $T^{+}$ denotes the $d_{T^{+}}$-dimensional set of vectors furnishing the future states of $T$ (almost always equal to 1 in practical applications),
- $T^{-}$ denotes the $d_{T}$-dimensional set of vectors furnishing the past and present states of $T$,
- $S^{-}$ denotes the $d_{S}$-dimensional set of vectors furnishing the past and present of $S$, and
- $C^{-}$ denotes the $d_{C}$-dimensional set of vectors furnishing the past and present of $C$.
Keyword arguments
dS
,dT
,dC
,dTf
(f
for future) are the dimensions of the $S^{-}$, $T^{-}$, $C^{-}$ and $T^{+}$ marginals. The parametersdS
,dT
,dC
anddTf
must each be a positive integer number.τS
,τT
,τC
are the embedding lags for $S^{-}$, $T^{-}$, $C^{-}$. Each parameter are integers∈ 𝒩⁰⁻
, or a vector of integers∈ 𝒩⁰⁻
, so that $S^{-}$, $T^{-}$, $C^{-}$ always represents present/past values. If e.g.τT
is an integer, then for the $T^-$ marginal is constructed using lags $\tau_{T} = \{0, \tau, 2\tau, \ldots, (d_{T}- 1)\tau_T \}$. If is a vector, e.g.τΤ = [-1, -5, -7]
, then the dimensiondT
must match the lags, and precisely those lags are used: $\tau_{T} = \{-1, -5, -7 \}$.- The prediction lag(s)
ηTf
is a positive integer. Combined with the requirement that the other delay parameters are zero or negative, this ensures that we're always predicting from past/present to future. In typical applications,ηTf = 1
is used for transfer entropy.
Examples
Say we wanted to compute the Shannon transfer entropy $TE^S(S \to T) = I^S(T^+; S^- | T^-)$. Using some modern procedure for determining optimal embedding parameters using methods from DynamicalSystems.jl, we find that the optimal embedding of $T^{-}$ is three-dimensional and is given by the lags [0, -5, -8]
. Using the same procedure, we find that the optimal embedding of $S^{-}$ is two-dimensional with lags $[-1, -8]$. We want to predicting a univariate version of the target variable one time step into the future (ηTf = 1
). The total embedding is then the set of embedding vectors
$E_{TE} = \{ (T(i+1), S(i-1), S(i-8), T(i), T(i-5), T(i-8)) \}$. Translating this to code, we get:
using Associations
julia> EmbeddingTE(dT=3, τT=[0, -5, -8], dS=2, τS=[-1, -4], ηTf=1)
# output
EmbeddingTE(dS=2, dT=3, dC=1, dTf=1, τS=[-1, -4], τT=[0, -5, -8], τC=-1, ηTf=1)
Partial mutual information
Associations.PartialMutualInformation
— TypePartialMutualInformation <: MultivariateInformationMeasure
PartialMutualInformation(; base = 2)
The partial mutual information (PMI) measure of conditional association (Zhao et al., 2016).
Definition
PMI is defined as for variables $X$, $Y$ and $Z$ as
\[PMI(X; Y | Z) = D(p(x, y, z) || p^{*}(x|z) p^{*}(y|z) p(z)),\]
where $p(x, y, z)$ is the joint distribution for $X$, $Y$ and $Z$, and $D(\cdot, \cdot)$ is the extended Kullback-Leibler divergence from $p(x, y, z)$ to $p^{*}(x|z) p^{*}(y|z) p(z)$. See Zhao et al. (2016) for details.
Estimation
The PMI is estimated by first estimating a 3D probability mass function using probabilities
, then computing $PMI(X; Y | Z)$ from those probaiblities.
Properties
For the discrete case, the following identities hold in theory (when estimating PMI, they may not).
PMI(X, Y, Z) >= CMI(X, Y, Z)
(where CMI is the Shannon CMI). Holds in theory, but when estimating PMI, the identity may not hold.PMI(X, Y, Z) >= 0
. Holds both in theory and when estimating using discrete estimators.X ⫫ Y | Z => PMI(X, Y, Z) = CMI(X, Y, Z) = 0
(in theory, but not necessarily for estimation).
Short expansion of conditional mutual information
Associations.ShortExpansionConditionalMutualInformation
— TypeShortExpansionConditionalMutualInformation <: MultivariateInformationMeasure
ShortExpansionConditionalMutualInformation(; base = 2)
SECMI(; base = 2) # alias
The short expansion of (Shannon) conditional mutual information (SECMI) measure from Kubkowski et al. (2021).
Description
The SECMI measure is defined as
\[SECMI(X,Y|Z) = I(X,Y) + \sum_{k=1}^{m} II(X,Z_k,Y) = (1 - m) I(X,Y) + \sum_{k=1}^{m} I(X,Y|Z_k).\]
This quantity is estimated from data using one of the estimators below from the formula
\[\widehat{SECMI}(X,Y|Z) = \widehat{I}(X,Y) + \sum_{k=1}^{m} \widehat{II}(X,Z_k,Y) = (1 - m) \widehat{I}(X,Y) + \sum_{k=1}^{m} \widehat{I}(X,Y|Z_k).\]
Compatible estimators
Estimation
- Example 1: Estimating
ShortExpansionConditionalMutualInformation
using theJointProbabilities
estimator using aCodifyVariables
withValueBinning
discretization.
Correlation measures
Associations.CorrelationMeasure
— TypeCorrelationMeasure <: AssociationMeasure end
The supertype for correlation measures.
Concrete implementations
Associations.PearsonCorrelation
— TypePearsonCorrelation
The Pearson correlation of two variables.
Usage
- Use with
association
to compute the raw Pearson correlation coefficient. - Use with
independence
to perform a formal hypothesis test for pairwise dependence using the Pearson correlation coefficient.
Description
The sample Pearson correlation coefficient for real-valued random variables $X$ and $Y$ with associated samples $\{x_i\}_{i=1}^N$ and $\{y_i\}_{i=1}^N$ is defined as
\[\rho_{xy} = \dfrac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y}) }{\sqrt{\sum_{i=1}^N (x_i - \bar{x})^2}\sqrt{\sum_{i=1}^N (y_i - \bar{y})^2}},\]
where $\bar{x}$ and $\bar{y}$ are the means of the observations $x_k$ and $y_k$, respectively.
Associations.PartialCorrelation
— TypePartialCorrelation <: AssociationMeasure
The correlation of two variables, with the effect of a set of conditioning variables removed.
Usage
- Use with
association
to compute the raw partial correlation coefficient. - Use with
independence
to perform a formal hypothesis test for correlated-based conditional independence.
Description
There are several ways of estimating the partial correlation. We follow the matrix inversion method, because for StateSpaceSet
s, we can very efficiently compute the required joint covariance matrix $\Sigma$ for the random variables.
Formally, let $X_1, X_2, \ldots, X_n$ be a set of $n$ real-valued random variables. Consider the joint precision matrix,$P = (p_{ij}) = \Sigma^-1$. The partial correlation of any pair of variables $(X_i, X_j)$, given the remaining variables $\bf{Z} = \{X_k\}_{i=1, i \neq i, j}^n$, is defined as
\[\rho_{X_i X_j | \bf{Z}} = -\dfrac{p_ij}{\sqrt{ p_{ii} p_{jj} }}\]
In practice, we compute the estimate
\[\hat{\rho}_{X_i X_j | \bf{Z}} = -\dfrac{\hat{p}_ij}{\sqrt{ \hat{p}_{ii} \hat{p}_{jj} }},\]
where $\hat{P} = \hat{\Sigma}^{-1}$ is the sample precision matrix.
Associations.DistanceCorrelation
— TypeDistanceCorrelation
The distance correlation (Székely et al., 2007) measure quantifies potentially nonlinear associations between pairs of variables. If applied to three variables, the partial distance correlation (Székely and Rizzo, 2014) is computed.
Usage
- Use with
association
to compute the raw (partial) distance correlation coefficient. - Use with
independence
to perform a formal hypothesis test for pairwise dependence.
Description
The distance correlation can be used to compute the association between two variables, or the conditional association between three variables, like so:
association(DistanceCorrelation(), x, y) → dcor ∈ [0, 1]
association(DistanceCorrelation(), x, y, z) → pdcor
With two variable, we comptue dcor
, which is called the empirical/sample distance correlation (Székely et al., 2007). With three variables, the partial distance correlation pdcor
is computed (Székely and Rizzo, 2014).
A partial distance correlation distance_correlation(X, Y, Z) = 0
doesn't always guarantee conditional independence X ⫫ Y | Z
. Székely and Rizzo (2014) for an in-depth discussion.
Associations.ChatterjeeCorrelation
— TypeChatterjeeCorrelation <: CorrelationMeasure
ChatterjeeCorrelation(; handle_ties = true, rng = Random.default_rng())
The Chatterjee correlation measure (Chatterjee, 2021) is an asymmetric measure of dependence between two variables.
If handle_ties == true
, then the first formula below is used. If you know for sure that there are no ties in your data, then set handle_ties == false
, which will use the second (faster) formula below.
When rearranging the input datasets, the second variable y
is sorted according to a sorting of the first variable x
. If x
has ties, then these ties are broken randomly and uniformly. For complete reproducibility in this step, you can specify rng
. If x
has no ties, then no randomization is performed.
Usage
- Use with
association
to compute the raw Chatterjee correlation coefficient. - Use with
SurrogateAssociationTest
to perform a surrogate test for significance of a Chatterjee-type association (example). When using a surrogate test for significance, the first input variable is shuffled according to the given surrogate method.
Description
The correlation statistic is defined as
\[\epsilon_n(X, Y) = 1 - \dfrac{n\sum_{i=1}^{n-1} |r_{i+1} - r_i|}{2\sum_{i=1}^n }.\]
When there are no ties among the $Y_1, Y_2, \ldots, Y_n$, the measure is
\[\epsilon_n(X, Y) = 1 - \dfrac{3\sum_{i=1}^{n-1} |r_{i+1} - r_i|}{n^2 - 1}.\]
This statistic estimates a quantity proposed by Dette et al. (2013), as indicated in Shi et al. (2022). It can therefore also be called the Chatterjee-Dette-Siburg-Stoimenov correlation coefficient.
Estimation
Associations.AzadkiaChatterjeeCoefficient
— TypeAzadkiaChatterjeeCoefficient <: AssociationMeasure
AzadkiaChatterjeeCoefficient(; theiler::Int = 0)
The Azadkia-Chatterjee coefficient (Azadkia and Chatterjee, 2021) is a coefficient for pairwise and conditional association inspired by the Chatterjee-Dette-Siburg-Stoimenov coefficient (Chatterjee, 2021; Dette et al., 2013) (see ChatterjeeCorrelation
).
Usage
- Use with
association
to compute the raw Azadkia-Chatterjee coefficient. - Use with
SurrogateAssociationTest
to perform a surrogate test for significance of a pairwise or conditional Azadkia-Chatterjee-type association (example). When using a surrogate test for significance, only the first input variable is shuffled according to the given surrogate method. - Use with
LocalPermutationTest
to perform a test of conditional independence (example).
Description
The pairwise statistic is
\[T_n(Y, \boldsymbol{Z} | \boldsymbol{X}) = \dfrac{\sum_{i=1}^n \left( \min{(R_i, R_{M_{(i)}})} - \min{(R_i, R_{N_{(i)}})} \right) }{\sum_{i=1}^n \left(R_i - \min{(R_i, R_{N_{(i)}})} \right)}.\]
where $R_i$ is the rank of the point $Y_i$ among all $Y_i$s, and $M_{(i)}$ and $N_{(i)}$ are indices of nearest neighbors of the points $\boldsymbol{X}_i$ and $(\boldsymbol{X}_i, \boldsymbol{Z}_i)$, respectively (given appropriately constructed marginal spaces). The theiler
keyword argument is an integer controlling the number of nearest neighbors to exclude during neighbor searches. The Theiler window defaults to 0
, which excludes self-neighbors, and is the only option considered in Azadkia and Chatterjee (2021).
In the case where $\boldsymbol{X}$ has no components (i.e. we're not conditioning), we also consider $L_i$ as the number of $j$ such that $Y_j \geq Y_i$. The measure is then defined as
\[T_n(Y, \boldsymbol{Z}) = \dfrac{\sum_{i=1}^n \left( n \min{(R_i, R_{M_{(i)}})} - L_i^2 \right) }{\sum_{i=1}^n \left( L_i (n - L_i) \right)}.\]
The value of the coefficient is on [0, 1]
when the number of samples goes to ∞
, but is not restricted to this interval in practice.
Input data
If the input data contain duplicate points, consider adding a small magnitude of noise to the input data. Otherwise, errors will occur when locating nearest neighbors.
Estimation
- Example 1. Estimating the Azadkia-Chatterjee coefficient to quantify associations for a chain of unidirectionally coupled variables, showcasing both pairwise and conditional associations.
- Example 2. Using
SurrogateAssociationTest
in combination with the Azadkia-Chatterjee coefficient to quantify significance of pairwise and conditional associations. - Example 3. Using
LocalPermutationTest
in combination with the Azadkia-Chatterjee coefficient to perform a test for conditional independence.
Cross-map measures
The cross-map measures define different ways of quantifying association based on the concept of "cross mapping", which has appeared in many contexts in the literature, and gained huge popularity with Sugihara et al. (2012)'s on convergent cross mapping.
Since their paper, several cross mapping methods and frameworks have emerged in the literature. In Associations.jl, we provide a unified interface for using these cross mapping methods.
Associations.CrossmapMeasure
— TypeCrossmapMeasure <: AssociationMeasure
The supertype for all cross-map measures. Concrete subtypes are
ConvergentCrossMapping
, orCCM
for short.PairwiseAsymmetricInference
, orPAI
for short.
See also: CrossmapEstimator
.
Associations.ConvergentCrossMapping
— TypeConvergentCrossMapping <: CrossmapMeasure
ConvergentCrossMapping(; d::Int = 2, τ::Int = -1, w::Int = 0,
f = Statistics.cor, embed_warn = true)
The convergent cross mapping measure (Sugihara et al., 2012).
Usage
- Use with
association
together with aCrossmapEstimator
to compute the cross-map correlation between input variables.
Compatible estimators
Description
The Theiler window w
controls how many temporal neighbors are excluded during neighbor searches (w = 0
means that only the point itself is excluded). f
is a function that computes the agreement between observations and predictions (the default, f = Statistics.cor
, gives the Pearson correlation coefficient).
Embedding
Let S(i)
be the source time series variable and T(i)
be the target time series variable. This version produces regular embeddings with fixed dimension d
and embedding lag τ
as follows:
\[( S(i), S(i+\tau), S(i+2\tau), \ldots, S(i+(d-1)\tau, T(i))_{i=1}^{N-(d-1)\tau}.\]
In this joint embedding, neighbor searches are performed in the subspace spanned by the first D-1
variables, while the last (D
-th) variable is to be predicted.
With this convention, τ < 0
implies "past/present values of source used to predict target", and τ > 0
implies "future/present values of source used to predict target". The latter case may not be meaningful for many applications, so by default, a warning will be given if τ > 0
(embed_warn = false
turns off warnings).
Estimation
- Example 1. Estimation with
RandomVectors
estimator. - Example 2. Estimation with
RandomSegment
estimator. - Example 3: Reproducing figures from Sugihara et al. (2012).
Associations.PairwiseAsymmetricInference
— TypePairwiseAsymmetricInference <: CrossmapMeasure
PairwiseAsymmetricInference(; d::Int = 2, τ::Int = -1, w::Int = 0,
f = Statistics.cor, embed_warn = true)
The pairwise asymmetric inference (PAI) measure (McCracken and Weigel, 2014) is a version of ConvergentCrossMapping
that searches for neighbors in mixed embeddings (i.e. both source and target variables included); otherwise, the algorithms are identical.
Usage
- Use with
association
to compute the pairwise asymmetric inference measure between variables.
Compatible estimators
Description
The Theiler window w
controls how many temporal neighbors are excluded during neighbor searches (w = 0
means that only the point itself is excluded). f
is a function that computes the agreement between observations and predictions (the default, f = Statistics.cor
, gives the Pearson correlation coefficient).
Embedding
There are many possible ways of defining the embedding for PAI. Currently, we only implement the "add one non-lagged source timeseries to an embedding of the target" approach, which is used as an example in McCracken & Weigel's paper. Specifically: Let S(i)
be the source time series variable and T(i)
be the target time series variable. PairwiseAsymmetricInference
produces regular embeddings with fixed dimension d
and embedding lag τ
as follows:
\[(S(i), T(i+(d-1)\tau, \ldots, T(i+2\tau), T(i+\tau), T(i)))_{i=1}^{N-(d-1)\tau}.\]
In this joint embedding, neighbor searches are performed in the subspace spanned by the first D
variables, while the last variable is to be predicted.
With this convention, τ < 0
implies "past/present values of source used to predict target", and τ > 0
implies "future/present values of source used to predict target". The latter case may not be meaningful for many applications, so by default, a warning will be given if τ > 0
(embed_warn = false
turns off warnings).
Estimation
- Example 1. Estimation with
RandomVectors
estimator. - Example 2. Estimation with
RandomSegment
estimator. - Example 3. Reproducing McCracken & Weigel's results from the original paper.
Closeness measures
Associations.ClosenessMeasure
— TypeClosenessMeasure <: AssociationMeasure
The supertype for all multivariate information-based measure definitions.
Implementations
Associations.JointDistanceDistribution
— TypeJointDistanceDistribution <: AssociationMeasure end
JointDistanceDistribution(; metric = Euclidean(), B = 10, D = 2, τ = -1, μ = 0.0)
The joint distance distribution (JDD) measure (Amigó and Hirata, 2018).
Usage
- Use with
association
to compute the joint distance distribution measureΔ
from Amigó and Hirata (2018). - Use with
independence
to perform a formal hypothesis test for directional dependence.
Keyword arguments
distance_metric::Metric
: An instance of a valid distance metric fromDistances.jl
. Defaults toEuclidean()
.B::Int
: The number of equidistant subintervals to divide the interval[0, 1]
into when comparing the normalised distances.D::Int
: Embedding dimension.τ::Int
: Embedding delay. By convention,τ
is negative.μ
: The hypothetical mean value of the joint distance distribution if there is no coupling betweenx
andy
(default isμ = 0.0
).
Description
From input time series $x(t)$ and $y(t)$, we first construct the delay embeddings (note the positive sign in the embedding lags; therefore the input parameter τ
is by convention negative).
\[\begin{align*} \{\bf{x}_i \} &= \{(x_i, x_{i+\tau}, \ldots, x_{i+(d_x - 1)\tau}) \} \\ \{\bf{y}_i \} &= \{(y_i, y_{i+\tau}, \ldots, y_{i+(d_y - 1)\tau}) \} \\ \end{align*}\]
The algorithm then proceeds to analyze the distribution of distances between points of these embeddings, as described in Amigó and Hirata (2018).
Examples
Associations.SMeasure
— TypeSMeasure < ClosenessMeasure
SMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)
SMeasure
is a bivariate association measure from Arnhold et al. (1999) and Quiroga et al. (2000) that measure directional dependence between two input (potentially multivariate) time series.
Note that τx
and τy
are negative; see explanation below.
Usage
- Use with
association
to compute the raw s-measure statistic. - Use with
independence
to perform a formal hypothesis test for directional dependence.
Description
The steps of the algorithm are:
- From input time series $x(t)$ and $y(t)$, construct the delay embeddings (note the positive sign in the embedding lags; therefore inputs parameters
τx
andτy
are by convention negative).
\[\begin{align*} \{\bf{x}_i \} &= \{(x_i, x_{i+\tau_x}, \ldots, x_{i+(d_x - 1)\tau_x}) \} \\ \{\bf{y}_i \} &= \{(y_i, y_{i+\tau_y}, \ldots, y_{i+(d_y - 1)\tau_y}) \} \\ \end{align*}\]
Let $r_{i,j}$ and $s_{i,j}$ be the indices of the
K
-th nearest neighbors of $\bf{x}_i$ and $\bf{y}_i$, respectively. Neighbors closed thanw
time indices are excluded during searches (i.e.w
is the Theiler window).Compute the the mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, using the indices $r_{i, j}$.
\[R_i^{(k)}(x) = \dfrac{1}{k} \sum_{i=1}^{k}(\bf{x}_i, \bf{x}_{r_{i,j}})^2\]
- Compute the y-conditioned mean squared Euclidean distance to the $K$ nearest neighbors for each $x_i$, now using the indices $s_{i,j}$.
\[R_i^{(k)}(x|y) = \dfrac{1}{k} \sum_{i=1}^{k}(\bf{x}_i, \bf{x}_{s_{i,j}})^2\]
- Define the following measure of independence, where $0 \leq S \leq 1$, and low values indicate independence and values close to one occur for synchronized signals.
\[S^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \dfrac{R_i^{(k)}(x)}{R_i^{(k)}(x|y)}\]
Input data
The algorithm is slightly modified from (Arnhold et al., 1999) to allow univariate timeseries as input.
- If
x
andy
areStateSpaceSet
s then usex
andy
as is and ignore the parametersdx
/τx
anddy
/τy
. - If
x
andy
are scalar time series, then createdx
anddy
dimensional embeddings, respectively, of bothx
andy
, resulting inN
differentm
-dimensional embedding points $X = \{x_1, x_2, \ldots, x_N \}$ and $Y = \{y_1, y_2, \ldots, y_N \}$.τx
andτy
control the embedding lags forx
andy
. - If
x
is a scalar-valued vector andy
is aStateSpaceSet
, or vice versa, then create an embedding of the scalar timeseries using parametersdx
/τx
ordy
/τy
.
In all three cases, input StateSpaceSets are length-matched by eliminating points at the end of the longest StateSpaceSet (after the embedding step, if relevant) before analysis.
See also: ClosenessMeasure
.
Associations.HMeasure
— TypeHMeasure <: AssociationMeasure
HMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)
The HMeasure
(Arnhold et al., 1999) is a pairwise association measure. It quantifies the probability with which close state of a target timeseries/embedding are mapped to close states of a source timeseries/embedding.
Note that τx
and τy
are negative by convention. See docstring for SMeasure
for an explanation.
Usage
- Use with
association
to compute the raw h-measure statistic. - Use with
independence
to perform a formal hypothesis test for directional dependence.
Description
The HMeasure
(Arnhold et al., 1999) is similar to the SMeasure
, but the numerator of the formula is replaced by $R_i(x)$, the mean squared Euclidean distance to all other points, and there is a $\log$-term inside the sum:
\[H^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \log \left( \dfrac{R_i(x)}{R_i^{(k)}(x|y)} \right).\]
Parameters are the same and $R_i^{(k)}(x|y)$ is computed as for SMeasure
.
See also: ClosenessMeasure
.
Associations.MMeasure
— TypeMMeasure <: ClosenessMeasure
MMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)
The MMeasure
(Andrzejak et al., 2003) is a pairwise association measure. It quantifies the probability with which close state of a target timeseries/embedding are mapped to close states of a source timeseries/embedding.
Note that τx
and τy
are negative by convention. See docstring for SMeasure
for an explanation.
Usage
- Use with
association
to compute the raw m-measure statistic. - Use with
independence
to perform a formal hypothesis test for directional dependence.
Description
The MMeasure
is based on SMeasure
and HMeasure
. It is given by
\[M^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \log \left( \dfrac{R_i(x) - R_i^{(k)}(x|y)}{R_i(x) - R_i^k(x)} \right),\]
where $R_i(x)$ is computed as for HMeasure
, while $R_i^k(x)$ and $R_i^{(k)}(x|y)$ is computed as for SMeasure
. Parameters also have the same meaning as for SMeasure
/HMeasure
.
See also: ClosenessMeasure
.
Associations.LMeasure
— TypeLMeasure <: ClosenessMeasure
LMeasure(; K::Int = 2, dx = 2, dy = 2, τx = - 1, τy = -1, w = 0)
The LMeasure
(Chicharro and Andrzejak, 2009) is a pairwise association measure. It quantifies the probability with which close state of a target timeseries/embedding are mapped to close states of a source timeseries/embedding.
Note that τx
and τy
are negative by convention. See docstring for SMeasure
for an explanation.
Usage
- Use with
association
to compute the raw L-measure statistic. - Use with
independence
to perform a formal hypothesis test for directional dependence.
Description
LMeasure
is similar to MMeasure
, but uses distance ranks instead of the raw distances.
Let $\bf{x_i}$ be an embedding vector, and let $g_{i,j}$ denote the rank that the distance between $\bf{x_i}$ and some other vector $\bf{x_j}$ in a sorted ascending list of distances between $\bf{x_i}$ and $\bf{x_{i \neq j}}$ In other words, $g_{i,j}$ this is just the $N-1$ nearest neighbor distances sorted )
LMeasure
is then defined as
\[L^{(k)}(x|y) = \dfrac{1}{N} \sum_{i=1}^{N} \log \left( \dfrac{G_i(x) - G_i^{(k)}(x|y)}{G_i(x) - G_i^k(x)} \right),\]
where $G_i(x) = \frac{N}{2}$ and $G_i^K(x) = \frac{k+1}{2}$ are the mean and minimal rank, respectively.
The $y$-conditioned mean rank is defined as
\[G_i^{(k)}(x|y) = \dfrac{1}{K}\sum_{j=1}^{K} g_{i,w_{i, j}},\]
where $w_{i,j}$ is the index of the $j$-th nearest neighbor of $\bf{y_i}$.
See also: ClosenessMeasure
.
Recurrence measures
Associations.MCR
— TypeMCR <: AssociationMeasure
MCR(; r, metric = Euclidean())
An association measure based on mean conditional probabilities of recurrence (MCR) introduced by Romano et al. (2007).
Usage
- Use with
association
to compute the raw MCR for pairwise or conditional association. - Use with
IndependenceTest
to perform a formal hypothesis test for pairwise or conditional association.
Description
r
is mandatory keyword which specifies the recurrence threshold when constructing recurrence matrices. It can be instance of any subtype of AbstractRecurrenceType
from RecurrenceAnalysis.jl. To use any r
that is not a real number, you have to do using RecurrenceAnalysis
first. The metric
is any valid metric from Distances.jl.
For input variables X
and Y
, the conditional probability of recurrence is defined as
\[M(X | Y) = \dfrac{1}{N} \sum_{i=1}^N p(\bf{y_i} | \bf{x_i}) = \dfrac{1}{N} \sum_{i=1}^N \dfrac{\sum_{i=1}^N J_{R_{i, j}}^{X, Y}}{\sum_{i=1}^N R_{i, j}^X},\]
where $R_{i, j}^X$ is the recurrence matrix and $J_{R_{i, j}}^{X, Y}$ is the joint recurrence matrix, constructed using the given metric
. The measure $M(Y | X)$ is defined analogously.
Romano et al. (2007)'s interpretation of this quantity is that if X
drives Y
, then M(X|Y) > M(Y|X)
, if Y
drives X
, then M(Y|X) > M(X|Y)
, and if coupling is symmetric, then M(Y|X) = M(X|Y)
.
Input data
X
and Y
can be either both univariate timeseries, or both multivariate StateSpaceSet
s.
Estimation
- Example 1. Pairwise versus conditional MCR.
Associations.RMCD
— TypeRMCD <: AssociationMeasure
RMCD(; r, metric = Euclidean(), base = 2)
The recurrence measure of conditional dependence, or RMCD (Ramos et al., 2017), is a recurrence-based measure that mimics the conditional mutual information, but uses recurrence probabilities.
Usage
- Use with
association
to compute the raw RMCD for pairwise or conditional association. - Use with
IndependenceTest
to perform a formal hypothesis test for pairwise or conditional association.
Description
r
is a mandatory keyword which specifies the recurrence threshold when constructing recurrence matrices. It can be instance of any subtype of AbstractRecurrenceType
from RecurrenceAnalysis.jl. To use any r
that is not a real number, you have to do using RecurrenceAnalysis
first. The metric
is any valid metric from Distances.jl.
Both the pairwise and conditional RMCD is non-negative, but due to round-off error, negative values may occur. If that happens, an RMCD value of 0.0
is returned.
Description
The RMCD measure is defined by
\[I_{RMCD}(X; Y | Z) = \dfrac{1}{N} \sum_{i} \left[ \dfrac{1}{N} \sum_{j} R_{ij}^{X, Y, Z} \log \left( \dfrac{\sum_{j} R_{ij}^{X, Y, Z} \sum_{j} R_{ij}^{Z} }{\sum_{j} \sum_{j} R_{ij}^{X, Z} \sum_{j} \sum_{j} R_{ij}^{Y, Z}} \right) \right],\]
where base
controls the base of the logarithm. $I_{RMCD}(X; Y | Z)$ is zero when $Z = X$, $Z = Y$ or when $X$, $Y$ and $Z$ are mutually independent.
Our implementation allows dropping the third/last argument, in which case the following mutual information-like quantitity is computed (not discussed in Ramos et al. (2017).
\[I_{RMCD}(X; Y) = \dfrac{1}{N} \sum_{i} \left[ \dfrac{1}{N} \sum_{j} R_{ij}^{X, Y} \log \left( \dfrac{\sum_{j} R_{ij}^{X} R_{ij}^{Y} }{\sum_{j} R_{ij}^{X, Y}} \right) \right]\]
Estimation
- Example 1. Pairwise versus conditional RMCD.