Conditional mutual information
CMIShannon
Estimation using ConditionalMutualInformationEstimator
s
When estimated using a ConditionalMutualInformationEstimator
, some form of bias correction is usually applied. The FPVP
estimator is a popular choice.
CMIShannon
with GaussianCMI
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = randn(1000)
y = randn(1000) .+ x
z = randn(1000) .+ y
condmutualinfo(GaussianCMI(), x, z, y) # defaults to `CMIShannon()`
0.004011491470507889
CMIShannon
with FPVP
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(FPVP(k = 5), x, z, y) # defaults to `CMIShannon()`
-0.12095122738568737
CMIShannon
with MesnerShalizi
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(MesnerShalizi(k = 10), x, z, y) # defaults to `CMIShannon()`
-0.15097897401855725
CMIShannon
with Rahimzamani
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(CMIShannon(base = 10), Rahimzamani(k = 10), x, z, y)
-0.0345454675857545
CMIRenyiPoczos
with PoczosSchneiderCMI
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(CMIRenyiPoczos(base = 2, q = 1.2), PoczosSchneiderCMI(k = 5), x, z, y)
-0.4063382485614671
Estimation using MutualInformationEstimator
s
Any MutualInformationEstimator
can also be used to compute conditional mutual information using the chain rule of mutual information. However, the naive application of these estimators don't perform any bias correction when taking the difference of mutual information terms.
CMIShannon
with KSG1
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(CMIShannon(base = 2), KSG1(k = 5), x, z, y)
-0.4727851450562168
Estimation using DifferentialEntropyEstimator
s
Any DifferentialEntropyEstimator
can also be used to compute conditional mutual information using a sum of entropies. However, the naive application of these estimators don't perform any bias application when taking the sum of entropy terms.
CMIShannon
with Kraskov
using CausalityTools
using Distributions
n = 1000
# A chain X → Y → Z
x = rand(Epanechnikov(0.5, 1.0), n)
y = rand(Erlang(1), n) .+ x
z = rand(FDist(5, 2), n)
condmutualinfo(CMIShannon(), Kraskov(k = 5), x, z, y)
-0.29815906572029816
Estimation using ProbabilitiesEstimator
s
Any ProbabilitiesEstimator
can also be used to compute conditional mutual information using a sum of entropies. However, the naive application of these estimators don't perform any bias application when taking the sum of entropy terms.
CMIShannon
with ValueHistogram
using CausalityTools
using Distributions
n = 1000
# A chain X → Y → Z
x = rand(Epanechnikov(0.5, 1.0), n)
y = rand(Erlang(1), n) .+ x
z = rand(FDist(5, 2), n)
est = ValueHistogram(RectangularBinning(5))
condmutualinfo(CMIShannon(), est, x, z, y), condmutualinfo(CMIShannon(), est, x, y, z)
(0.0017888259205205426, 0.10293518214892597)