ComplexityMeasures.jl Dev Docs
Good practices in developing a code base apply in every Pull Request. The Good Scientific Code Workshop is worth checking out for this.
All PRs contributing new functionality must be well tested and well documented. You only need to add tests for methods that you explicitly extended.
Adding a new OutcomeSpace
Mandatory steps
- Decide on the outcome space and how the estimator will map probabilities to outcomes.
- Define your type and make it subtype
OutcomeSpace
. - Add a docstring to your type following the style of the docstrings of other estimators.
- If suitable, the estimator may be able to operate based on
Encoding
s. If so, it is preferred to implement anEncoding
subtype and extend the methodsencode
anddecode
. This will allow your outcome space to be used with a larger span of entropy and complexity methods without additional effort. Have a look at the file definingOrdinalPatterns
for an idea of how this works.
If your new outcome space is counting-based, then
- Implement dispatch for
counts
for yourOutcomeSpace
type. This method should return aCounts
instance (just a wrapper around aDimArray
). Follow existing implementations for guidelines, and ensure that the outcomes are the dimension labels on the array. You'll then getcounts_and_outcomes
for free. - Implement dispatch for
symbolize
. This will ensure that the outcome space also works automatically with any discrete estimators in CausalityTools.jl.
If your new outcome space is not counting-based, then
- Implement dispatch for
probabilities
for yourOutcomeSpace
type. This method should return aProbabilities
instance (also just a wrapper aroundDimArray
with the constraint that the entries of the array must sum to 1). Follow existing implementations for guidelines, and ensure that the outcomes are the dimension labels on the probabilities array. You'll then get the methodsprobabilities_and_outcomes
andoutcomes
for free.
Finally,
- Implement dispatch for
outcome_space
and yourOutcomeSpace
type. The return value ofoutcome_space
must be sorted (as in the default behavior ofsort
, in ascending order). - Add your outcome space type to the table list in the documentation page of outcome space. If you made an encoding, also add it to corresponding table in the encodings section.
Optional steps
The following methods may be extended for your OutcomeSpace
if doing so leads to performance benefits.
total_outcomes
. By default it returns thelength
ofoutcome_space
. This is the function that most typically has performance benefits if implemented explicitly, so most existing estimators extend it by default.
Adding a new ProbabilitiesEstimator
Mandatory steps
- Define your type and make it subtype
ProbabilitiesEstimator
. - Add a docstring to your type following the style of the docstrings of other
ProbabilitiesEstimator
s. - Implement dispatch for
probabilities
for yourProbabilitiesEstimator
type. You'll then getprobabilities_and_outcomes
for free. - Implement dispatch for
allprobabilities
for yourProbabilitiesEstimator
type. You'll then getallprobabilities_and_outcomes
for free. - Add your new
ProbabilitiesEstimator
type to the list of probabilities estimators in the probabilities estimators documentation section.
Adding a new InformationMeasureEstimator
The type implementation should follow the declared API of InformationMeasureEstimator
. If the type is a discrete measure, then extend information(e::YourType, p::Probabilities)
. If it is a differential measure, then extend information(e::YourType, x::InputData)
.
ComplexityMeasures.InformationMeasureEstimator
— TypeInformationMeasureEstimator{I <: InformationMeasure}
The supertype of all information measure estimators. Its direct subtypes are DiscreteInfoEstimator
and DifferentialInfoEstimator
.
Since all estimators must reference a measure definition in some way, we made the following interface decisions:
- all estimators have as first type parameter
I <: InformationMeasure
- all estimators reference the information measure in a
definition
field - all estimators are defined using
Base.@kwdef
so that they may be initialized with the syntaxEstimator(; definition = Shannon())
(or any other).
Any concrete subtypes must follow the above, e.g.:
Base.@kwdef struct MyEstimator{I <: InformationMeasure, X} <: DiscreteInfoEstimator{I}
definition::I
x::X
end
In real applications, we generally don't have access to the underlying probability mass functions or densities required to compute the various entropy or extropy definitons. Therefore, these information measures must be estimated from finite data. Estimating a particular measure (e.g. Shannon
entropy) can be done in many ways, each with its own own pros and cons. We aim to provide a complete library of literature estimators of the various information measures (PRs are welcome!).
Adding a new InformationMeasure
This amounts to adding a new definition of an information measure, not an estimator. It de-facto means adding a method for the discrete Plug-In estimator.
Mandatory steps
- Define your information measure definition type and make it subtype
InformationMeasure
. - Implement dispatch for
information
(def::YourType, p::Probabilities)
. This is the Plug-In estimator for the discrete measure. - Add a docstring to your type following the style of the docstrings of other information measure definitions, and should include the mathematical definition of the measure.
- Add your information measure definition type to the list of definitions in the
docs/src/information_measures.md
documentation page. - Add a reference to your information measure definition in the docstring for
InformationMeasure
.
Optional steps
- If the maximum value of your information measure type is analytically computable for a probability distribution with a known number of elements, implementing dispatch for
information_maximum
automatically enablesinformation_normalized
for your type.