Conditional mutual information
CMIShannon
Estimation using ConditionalMutualInformationEstimator
s
When estimated using a ConditionalMutualInformationEstimator
, some form of bias correction is usually applied. The FPVP
estimator is a popular choice.
CMIShannon
with GaussianCMI
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = randn(1000)
y = randn(1000) .+ x
z = randn(1000) .+ y
condmutualinfo(GaussianCMI(), x, z, y) # defaults to `CMIShannon()`
0.0017375851072253257
CMIShannon
with FPVP
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(FPVP(k = 5), x, z, y) # defaults to `CMIShannon()`
-0.14002755285440707
CMIShannon
with MesnerShalizi
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(MesnerShalizi(k = 10), x, z, y) # defaults to `CMIShannon()`
-0.12043962640339888
CMIShannon
with Rahimzamani
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(CMIShannon(base = 10), Rahimzamani(k = 10), x, z, y)
-0.03165026116054392
CMIRenyiPoczos
with PoczosSchneiderCMI
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(CMIRenyiPoczos(base = 2, q = 1.2), PoczosSchneiderCMI(k = 5), x, z, y)
-0.3927828474337882
Estimation using MutualInformationEstimator
s
Any MutualInformationEstimator
can also be used to compute conditional mutual information using the chain rule of mutual information. However, the naive application of these estimators don't perform any bias correction when taking the difference of mutual information terms.
CMIShannon
with KSG1
using CausalityTools
using Distributions
using Statistics
n = 1000
# A chain X → Y → Z
x = rand(Normal(-1, 0.5), n)
y = rand(BetaPrime(0.5, 1.5), n) .+ x
z = rand(Chisq(100), n)
z = (z ./ std(z)) .+ y
# We expect zero (in practice: very low) CMI when computing I(X; Z | Y), because
# the link between X and Z is exclusively through Y, so when observing Y,
# X and Z should appear independent.
condmutualinfo(CMIShannon(base = 2), KSG1(k = 5), x, z, y)
-0.38323689721061016
Estimation using DifferentialEntropyEstimator
s
Any DifferentialEntropyEstimator
can also be used to compute conditional mutual information using a sum of entropies. However, the naive application of these estimators don't perform any bias application when taking the sum of entropy terms.
CMIShannon
with Kraskov
using CausalityTools
using Distributions
n = 1000
# A chain X → Y → Z
x = rand(Epanechnikov(0.5, 1.0), n)
y = rand(Erlang(1), n) .+ x
z = rand(FDist(5, 2), n)
condmutualinfo(CMIShannon(), Kraskov(k = 5), x, z, y)
-0.2831772952525391
Estimation using ProbabilitiesEstimator
s
Any ProbabilitiesEstimator
can also be used to compute conditional mutual information using a sum of entropies. However, the naive application of these estimators don't perform any bias application when taking the sum of entropy terms.
CMIShannon
with ValueHistogram
using CausalityTools
using Distributions
n = 1000
# A chain X → Y → Z
x = rand(Epanechnikov(0.5, 1.0), n)
y = rand(Erlang(1), n) .+ x
z = rand(FDist(5, 2), n)
est = ValueHistogram(RectangularBinning(5))
condmutualinfo(CMIShannon(), est, x, z, y), condmutualinfo(CMIShannon(), est, x, y, z)
(0.003883726945883348, 0.14544599577667588)