Counting the number of measures in ComplexityMeasures.jl
In this page we will count all the possible complexity measures than one can compute with the current version of ComplexityMeasures.jl!
using ComplexityMeasures
using InteractiveUtils: subtypes
import Pkg; Pkg.status("ComplexityMeasures")
Status `~/work/ComplexityMeasures.jl/ComplexityMeasures.jl/docs/Project.toml`
[ab4b797d] ComplexityMeasures v3.8.0 `~/work/ComplexityMeasures.jl/ComplexityMeasures.jl`
First let's define a function that counts concrete subtypes that we will be re-using to count measures in ComplexityMeasures.jl.
concrete_subtypes(type::Type) = concrete_subtypes!(Any[], type)
function concrete_subtypes!(out, type::Type)
if !isabstracttype(type)
push!(out, type)
else
foreach(T -> concrete_subtypes!(out, T), subtypes(type))
end
out
end
concrete_subtypes! (generic function with 1 method)
Count Based Outcome Spaces
Each OutcomeSpace
is a possible way of discretizing the input data. For the purpose of counting measures, we treat the an outcome space with different input parameters as the same outcome space overall.
Some outcome spaces are count-based. We estimate these separately, because they may be estimated with various different probabilities estimators. For our counting here it doesn't matter whether the outcome space supports spatiotemporal or trajectory (uni/multi variate) date. We only care if it is counting based or not.
OUTCOME_SPACES_COUNT = concrete_subtypes(ComplexityMeasures.CountBasedOutcomeSpace)
12-element Vector{Any}:
BubbleSortSwaps
AmplitudeAwareOrdinalPatterns
OrdinalPatterns
WeightedOrdinalPatterns
SpatialBubbleSortSwaps
SpatialDispersion
SpatialOrdinalPatterns
CosineSimilarityBinning
Dispersion
SequentialPairDistances
UniqueElements
ValueBinning
We do a small correction here because two outcome spaces aren't count-based but for internal convenience they satisfy the subtyping relationship
correction_ospaces = (AmplitudeAwareOrdinalPatterns, WeightedOrdinalPatterns)
foreach(
T -> deleteat!(OUTCOME_SPACES_COUNT, findfirst(isequal(T), OUTCOME_SPACES_COUNT)),
correction_ospaces
)
OUTCOME_SPACES_COUNT
10-element Vector{Any}:
BubbleSortSwaps
OrdinalPatterns
SpatialBubbleSortSwaps
SpatialDispersion
SpatialOrdinalPatterns
CosineSimilarityBinning
Dispersion
SequentialPairDistances
UniqueElements
ValueBinning
Probabilities can be estimated from count-based outcome spaces in different ways. We count the same ProbabilitiesEstimators
with different input parameters as the same estimator. Each probabilities estimator can be combined with any outcome space.
PROBESTS_COUNT = concrete_subtypes(ProbabilitiesEstimator)
4-element Vector{Any}:
AddConstant
BayesianRegularization
RelativeAmount
Shrinkage
and we count the combinations
n_outcome_spaces_count = length(OUTCOME_SPACES_COUNT)
n_probests_count = length(PROBESTS_COUNT)
n_probs_count = n_outcome_spaces_count * n_probests_count
40
Non-count-based outcome spaces
We also provide some outcome spaces that are not count-based, but can still be used to estimate discrete probabilities by using some sort of "relative amount" estimation.
OUTCOME_SPACES_NOCOUNT = setdiff(
concrete_subtypes(ComplexityMeasures.OutcomeSpace),
concrete_subtypes(ComplexityMeasures.CountBasedOutcomeSpace),
)
4-element Vector{Any}:
NaiveKernel
PowerSpectrum
TransferOperator
WaveletOverlap
to which we add back the outcome spaces correction
push!(OUTCOME_SPACES_NOCOUNT, correction_ospaces...)
OUTCOME_SPACES_NOCOUNT
6-element Vector{Any}:
NaiveKernel
PowerSpectrum
TransferOperator
WaveletOverlap
AmplitudeAwareOrdinalPatterns
WeightedOrdinalPatterns
Only RelativeAmount
probabilities estimator works with non-count-based outcome spaces
n_probs_noncount = length(OUTCOME_SPACES_NOCOUNT) * 1
6
Grand total of extracting PMFs from data
Therefore the total ways to estimate discrete probabilities from data in ComplexityMeasures.jl is just
n_probs_discrete = n_probs_noncount + n_probs_count
46
Discrete Information measures
Currently, the InformationMeasures implemented are different types of entropies and the lesser-known extropies. Each of these measures, in their discrete form, are functions of probability mass functions (PMFs). Therefore, we can combine each of these measure with probabilities estimated using any count-based outcome space and any probabilities estimator.
Let's collect all of these discrete measures
INFO_MEASURES_DISCRETE = concrete_subtypes(InformationMeasure)
12-element Vector{Any}:
Curado
Identification
Kaniadakis
Renyi
Shannon
StretchedExponential
Tsallis
ElectronicEntropy
FluctuationComplexity
RenyiExtropy
ShannonExtropy
TsallisExtropy
Each information measure can be estimated using any of the generic estimators:
INFO_MEASURE_ESTIMATOR_GENERIC = concrete_subtypes(ComplexityMeasures.DiscreteInfoEstimatorGeneric)
2-element Vector{Any}:
Jackknife
PlugIn
so we count by multiplying
n_discrete_infoest_generic = length(INFO_MEASURES_DISCRETE)*length(INFO_MEASURE_ESTIMATOR_GENERIC)
24
In addition to the generic estimators, we also provide additional estimators specific to Shannon entropy.
INFO_MEASURE_ESTIMATOR_SHANNON = concrete_subtypes(ComplexityMeasures.DiscreteInfoEstimatorShannon)
5-element Vector{Any}:
ChaoShen
GeneralizedSchuermann
HorvitzThompson
MillerMadow
Schuermann
For these there is no variability of the information measure so
n_discrete_estimators_shannon = length(INFO_MEASURE_ESTIMATOR_SHANNON)
5
This gives us the total possible ways of estimating information measures given a PMF:
n_discrete_info_est = n_discrete_estimators_shannon + n_discrete_infoest_generic
29
Grand total of discrete information measures
This total is obtained as the direct multiplication of all ways to obtain a PMF and all ways to compute an information measure from PMF
n_discrete_info = n_discrete_info_est * n_probs_discrete
1334
That's quite a lot and we are only half-way done!
Differential information measures
The differential information measures and their estimators are all grouped into one level of abstraction as long as the user is concerned, so counting things here is very simple!
DIFF_INFO_EST = concrete_subtypes(DifferentialInfoEstimator)
12-element Vector{Any}:
AlizadehArghami
Gao
Goria
KozachenkoLeonenko
Kraskov
Lord
Zhu
ZhuSingh
Correa
Ebrahimi
LeonenkoProzantoSavani
Vasicek
All of these estimate one quantity (the differential Shannon entropy), with the exception of one particular estimator (LeonenkoProzantoSavani
) that can estimate also Tsallis and Renyi entropies. Therefore, the number of differential measures one can estimate within ComplexityMeasures.jl is:
n_diff_info = length(DIFF_INFO_EST) + 2
14
Complexity measures
We also provide a number of estimators that are not probability based, which we call just complexity estimators for this discussion. We count each of these as a separate measure.
COMPLEXITY_ESTIMATORS = concrete_subtypes(ComplexityEstimator)
7-element Vector{Any}:
ApproximateEntropy
BubbleEntropy
LempelZiv76
MissingDispersionPatterns
ReverseDispersion
SampleEntropy
StatisticalComplexity
However, from these we need to treat StatisticalComplexity
separately, so we have
n_complexity_measures_basic = length(COMPLEXITY_ESTIMATORS) - 1
6
In ComplexityMeasures.jl StatisticalComplexity
can be combined with any discrete information measure, any information estimator, and any count-based outcome space. Additionally, StatisticalComplexity
in ComplexityMeasures.jl can be combined with any metric from the Distances.jl package.
For StatisticalComplexity
, counting all possible combinations of outcome spaces, probabilities estimators, information measure definitions, information measure estimators, along with distance measures, as unique measures would over-inflate the measure count. For practicality, we here count different version of StatisticalComplexity
by considering the number of statistical complexity measures resulting from counting unique outcome spaces and information measures, since these are the largest contributors to changes in the computed numerical value of the measure. Therefore we have:
n_complexity_measures_statistical_complexity = length(INFO_MEASURES_DISCRETE) * length(concrete_subtypes(OutcomeSpace))
192
which gives us the following total number of complexity estimators
n_complexity_measures_total = n_complexity_measures_basic + n_complexity_measures_statistical_complexity
198
Probabilities functions
Besides calculating complexity measures, ComplexityMeasures.jl gives the user the unique possibility of accessing the probability mass function directly. As we show in the associated article, this allows rather straightforwardly defining new, or expanding existing, complexity measures. For example, the MissingDispersionPatterns
is just a wrapper of the missing_outcomes
function.
Therefore, we believe it is fair to count a couple of probabilities functions by themselves as additional complexity measures. In particular, we count here the functions probabilities, allproabilities
as candidates for it, as all other functions of the library are simple processing of these two. Given that each function can work with any type of outcome space and probability estimation technique, we obtain
n_extra_prob_measures = 2 * n_probs_discrete
92
Grand total of measures
Right, so the grand total of all measures that can be estimated with ComplexityMeasures.jl are:
n_grand_total =
n_discrete_info +
n_diff_info +
n_complexity_measures_total +
n_extra_prob_measures
1638