Conditional mutual information

CMIShannon

Estimation using ConditionalMutualInformationEstimators

When estimated using a ConditionalMutualInformationEstimator, some form of bias correction is usually applied. The FPVP estimator is a popular choice.

CMIShannon with GaussianCMI

using CausalityTools
using Distributions
using Statistics

n = 1000
# A chain X → Y → Z
x = randn(1000)
y = randn(1000) .+ x
z = randn(1000) .+ y
condmutualinfo(GaussianCMI(), x, z, y) # defaults to `CMIShannon()`
0.0017375851072253257

CMIShannon with FPVP

using CausalityTools
using Distributions
using Statistics

n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(FPVP(k = 5), x, z, y) # defaults to `CMIShannon()`
-0.14002755285440707

CMIShannon with MesnerShalizi

using CausalityTools
using Distributions
using Statistics

n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(MesnerShalizi(k = 10), x, z, y) # defaults to `CMIShannon()`
-0.12043962640339888

CMIShannon with Rahimzamani

using CausalityTools
using Distributions
using Statistics

n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(CMIShannon(base = 10), Rahimzamani(k = 10), x, z, y)
-0.03165026116054392

CMIRenyiPoczos with PoczosSchneiderCMI

using CausalityTools
using Distributions
using Statistics

n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(CMIRenyiPoczos(base = 2, q = 1.2), PoczosSchneiderCMI(k = 5), x, z, y)
-0.3927828474337882

Estimation using MutualInformationEstimators

Any MutualInformationEstimator can also be used to compute conditional mutual information using the chain rule of mutual information. However, the naive application of these estimators don't perform any bias correction when taking the difference of mutual information terms.

CMIShannon with KSG1

using CausalityTools
using Distributions
using Statistics

n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y

# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(CMIShannon(base = 2), KSG1(k = 5), x, z, y)
-0.38323689721061016

Estimation using DifferentialEntropyEstimators

Any DifferentialEntropyEstimator can also be used to compute conditional mutual information using a sum of entropies. However, the naive application of these estimators don't perform any bias application when taking the sum of entropy terms.

CMIShannon with Kraskov

using CausalityTools
using Distributions
n = 1000
# A chain X → Y → Z
x = rand(Epanechnikov(0.5, 1.0), n)
y = rand(Erlang(1), n) .+ x
z = rand(FDist(5, 2), n)
condmutualinfo(CMIShannon(), Kraskov(k = 5), x, z, y)
-0.2831772952525391

Estimation using ProbabilitiesEstimators

Any ProbabilitiesEstimator can also be used to compute conditional mutual information using a sum of entropies. However, the naive application of these estimators don't perform any bias application when taking the sum of entropy terms.

CMIShannon with ValueHistogram

using CausalityTools
using Distributions
n = 1000
# A chain X → Y → Z
x = rand(Epanechnikov(0.5, 1.0), n)
y = rand(Erlang(1), n) .+ x
z = rand(FDist(5, 2), n)
est = ValueHistogram(RectangularBinning(5))
condmutualinfo(CMIShannon(), est, x, z, y), condmutualinfo(CMIShannon(), est, x, y, z)
(0.003883726945883348, 0.14544599577667588)