# ComplexityMeasures.jl Dev Docs

Good practices in developing a code base apply in every Pull Request. The Good Scientific Code Workshop is worth checking out for this.

All PRs contributing new functionality must be well tested and well documented. You only need to add tests for methods that you **explicitly** extended.

## Adding a new `OutcomeSpace`

### Mandatory steps

- Decide on the outcome space and how the estimator will map probabilities to outcomes.
- Define your type and make it subtype
`OutcomeSpace`

. - Add a docstring to your type following the style of the docstrings of other estimators.
- If suitable, the estimator may be able to operate based on
`Encoding`

s. If so, it is preferred to implement an`Encoding`

subtype and extend the methods`encode`

and`decode`

. This will allow your outcome space to be used with a larger span of entropy and complexity methods without additional effort. Have a look at the file defining`OrdinalPatterns`

for an idea of how this works.

If your new outcome space is counting-based, then

- Implement dispatch for
`counts_and_outcomes`

for your`OutcomeSpace`

type. If the outcomes do not come for free, then instead you can extend`counts`

and then explicitly add another method for`counts_and_outcomes`

that calls`counts`

first and then decodes the outcomes. Follow existing implementations for guidelines (see for example source code for`Dispersion`

). - Implement dispatch for
`codify`

. This will ensure that the outcome space also works automatically with any discrete estimators in the downstream CausalityTools.jl.

If your new outcome space is not counting-based, then

- Implement dispatch for
`probabilities_and_outcomes`

for your`OutcomeSpace`

type. If the outcomes do not come for free, then instead you can extend`probabilities`

and then explicitly add another method for`probabilities_and_outcomes`

that calls`probabilities`

first and then decodes the outcomes. Follow existing implementations for guidelines (see for example source code for`NaiveKernel`

).

Finally,

- Implement dispatch for
`outcome_space`

and your`OutcomeSpace`

type. The return value of`outcome_space`

must be sorted (as in the default behavior of`sort`

, in ascending order). - Add your outcome space type to the table list in the documentation string of
`OutcomeSpace`

. If you made an encoding, also add it to corresponding table in the encodings section.

### Optional steps

The following methods may be extended for your `OutcomeSpace`

if doing so leads to performance benefits.

`total_outcomes`

. By default it returns the`length`

of`outcome_space`

. This is the function that most typically has performance benefits if implemented explicitly, so most existing estimators extend it by default.

## Adding a new `ProbabilitiesEstimator`

### Mandatory steps

- Define your type and make it subtype
`ProbabilitiesEstimator`

. - Add a docstring to your type following the style of the docstrings of other
`ProbabilitiesEstimator`

s. - Implement dispatch for
`probabilities`

for your`ProbabilitiesEstimator`

type. You'll then get`probabilities_and_outcomes`

for free. - Implement dispatch for
`allprobabilities_and_outcomes`

for your`ProbabilitiesEstimator`

type. - Add your new
`ProbabilitiesEstimator`

type to the list of probabilities estimators in the probabilities estimators documentation section.

## Adding a new `InformationMeasureEstimator`

The type implementation should follow the declared API of `InformationMeasureEstimator`

. If the type is a discrete measure, then extend `information(e::YourType, p::Probabilities)`

. If it is a differential measure, then extend `information(e::YourType, x::InputData)`

.

`ComplexityMeasures.InformationMeasureEstimator`

— Type`InformationMeasureEstimator{I <: InformationMeasure}`

The supertype of all information measure estimators. Its direct subtypes are `DiscreteInfoEstimator`

and `DifferentialInfoEstimator`

.

Since all estimators must reference a measure definition in some way, we made the following interface decisions:

- all estimators have as first type parameter
`I <: InformationMeasure`

- all estimators reference the information measure in a
`definition`

field - all estimators are defined using
`Base.@kwdef`

so that they may be initialized with the syntax`Estimator(; definition = Shannon())`

(or any other).

Any concrete subtypes must follow the above, e.g.:

```
Base.@kwdef struct MyEstimator{I <: InformationMeasure, X} <: DiscreteInfoEstimator{I}
definition::I
x::X
end
```

In real applications, we generally don't have access to the underlying probability mass functions or densities required to compute the various entropy or extropy definitons. Therefore, these information measures must be *estimated* from finite data. Estimating a particular measure (e.g. `Shannon`

entropy) can be done in many ways, each with its own own pros and cons. We aim to provide a complete library of literature estimators of the various information measures (PRs are welcome!).

## Adding a new `InformationMeasure`

This amounts to adding a new definition of an information measure, not an estimator. It de-facto means adding a method for the discrete Plug-In estimator.

### Mandatory steps

- Define your information measure definition type and make it subtype
`InformationMeasure`

. - Implement dispatch for
`information`

`(def::YourType, p::Probabilities)`

. This is the Plug-In estimator for the discrete measure. - Add a docstring to your type following the style of the docstrings of other information measure definitions, and should include the mathematical definition of the measure.
- Add your information measure definition type to the list of definitions in the
`docs/src/information_measures.md`

documentation page. - Add a reference to your information measure definition in the docstring for
`InformationMeasure`

.

### Optional steps

- If the maximum value of your information measure type is analytically computable for a probability distribution with a known number of elements, implementing dispatch for
`information_maximum`

automatically enables`information_normalized`

for your type.

## Adding a new `MultiScaleAlgorithm`

A new `MultiScaleAlgorithm`

is simply a new way of coarse-graining input time series across multiple scales.

### Mandatory steps

- Define a new type
`YourNewMultiScaleType <: MultiScaleAlgorithm`

. This type will define how coarse graining is performed. - Implement dispatch for
`downsample`

, which transforms the original time series into a vector of coarse-grained time series, one per scale (may be nested if needed). - Implement dispatch for the internal
`apply_multiscale`

function. - Add an entry for your new type in the
`multiscale.md`

file. - Add tests for your new type. You specifically need to implement analytical tests that verify that
`downsample`

is correctly implemented. For API tests, simply copy the tests from e.g.`tests/multiscale/Composite.jl`

, and replace the multiscale coarse-graining algorithm with an instance of your algorithm. - Hooray! You're new coarse-graining procedure is integrated with the entire ComplexityMeasures.jl ecosystem!