RecurrenceMicrostatesAnalysis.jl for devs

Tip

All pull requests that introduce new functionality must be thoroughly tested and documented. Tests are required only for the methods you extend. Always remember to add docstrings to your implementations, as well as tests to validate them.

We recommend reading the Good Scientific Code Workshop

Backend of RecurrenceMicrostatesAnalysis.jl

RecurrenceMicrostatesAnalysis.jl has multiple backends, depending on the usage context. Each backend is implemented based on an internal struct RMACore or on the input data type.

There are three backend implementations:

  • For sequential data using the CPU: it internally uses a CPUCore <: RMACore, which defines that the process must run on the CPU, and the input data is a StateSpaceSet (or a Vector{<:Real}).

  • For spatial data using the CPU: it internally uses a CPUCore <: RMACore, which defines that the process must run on the CPU, and the input data is an Array.

  • For sequential data using the GPU: it internally uses a GPUCore <: RMACore, which defines that the process must run on the GPU, and the input data is an AbstractGPUVector{<:SVector}.

Note that there is a significant difference between the input data types, in such a way that RMACore is just an auxiliary struct used to differentiate the hardware backend internally.

Backend

The package backends are located at src/core/cpu_core.jl (CPU) and src/core/gpu/gpu_core.jl (GPU).

You do not need to fully understand how the backend operates to define something new in the package. However, it is important to understand how the backend requires internal functions based on RMACore or the input data type. For example, when implementing a RecurrenceExpression, the CPU structure uses a recurrence function, while the GPU structure uses a gpu_recurrence function.

Adding a new Recurrence Expression

Steps

  1. Define the mathematical expression of your recurrence expression. It must return a binary value: UInt(0) for non-recurrence and UInt(1) for recurrence.
  2. Define a new type YourType <: RecurrenceExpression. Constant parameters (e.g., recurrence threshold and metric) should be fields of this type.
  3. Implement the appropriate recurrence dispatch:
    • Sequential data: recurrence(expr::YourType, x::StateSpaceSet, y::StateSpaceSet, i::Int, j::Int)
    • Spatial data: recurrence(expr::YourType, x::AbstractArray{<:Real}, y::AbstractArray{<:Real}, i::NTuple{N,Int}, j::NTuple{M,Int})
    • GPU: gpu_recurrence(expr::YourType, x, y, i, j, n)
  4. Add a docstring describing the mathematical definition and relevant references.
  5. Add the recurrence expression to docs/src/api.md.
  6. Add the expression to the RecurrenceExpression docstring.
  7. Add tests to test/distributions.jl and test/recurrences.jl.

Example

Let's define a "recurrence" expression as:

\[r_{(i,j)} = \Theta(\|\vec{x}_i - \vec{x}_j\| - \varepsilon).\]

First, we define our struct:

using RecurrenceMicrostatesAnalysis
using Distances: Euclidean, Metric, evaluate

struct MyRecurrenceExpr{T <: Real, M <: Metric} <: RecurrenceExpression{T, M}
    ε::T
    metric::M
end

MyRecurrenceExpr(ε) = MyRecurrenceExpr(ε, Euclidean())
Main.MyRecurrenceExpr

Next, we define the recurrence function:

@inline function RecurrenceMicrostatesAnalysis.recurrence(
    expr::MyRecurrenceExpr,
    x::StateSpaceSet,
    y::StateSpaceSet,
    i::Int,
    j::Int,
)
    distance = @inbounds evaluate(expr.metric, x.data[i], y.data[j])
    return UInt8(distance ≥ expr.ε)
end

And that's it:

rmspace = RecurrenceMicrostates(MyRecurrenceExpr(0.27), 3)
RecurrenceMicrostates, with 3 fields:
 shape = RecurrenceMicrostatesAnalysis.Rect2Microstate{3, 3, 2}()
 expr = Main.MyRecurrenceExpr{Float64, Euclidean}(0.27, Euclidean(0.0))
 sampling = SRandom{Float64}(0.05)
X = randn(1000)
probabilities(rmspace, X)
 Probabilities{Float64,1} over 512 outcomes
   1  0.0001405594265175398
   2  6.023975422180277e-5
   3  2.007991807393426e-5
   4  2.007991807393426e-5
   5  2.007991807393426e-5
   6  0.0
   7  2.007991807393426e-5
   8  0.0003815184434047509
   9  0.0
  10  2.007991807393426e-5
   ⋮  
 504  0.036003293106564124
 505  0.0025702295134635853
 506  0.009297002068231561
 507  0.00773076845846469
 508  0.037971125077809684
 509  0.008975723379048613
 510  0.037288407863295917
 511  0.03624425212345134
 512  0.2602959779924098

Adding a new Sampling Mode

Steps

  1. Define how the sampling mode operates: which microstates are sampled, from which regions, and in what quantity. The SamplingSpace must be taken into account when designing the sampling logic.
  2. Define a new struct that is a subtype of SamplingMode. The struct may be empty (e.g., Full) or contain parameters such as a sampling ratio (e.g., SRandom).
  3. Implement the dispatch get_num_samples(mode::YourType, ::SamplingSpace), which determines the number of samples to be drawn given the sampling mode and the sampling space.
  4. Implement the dispatch get_sample(::RMACore, ::YourType, space::SamplingSpace, rng, m), which returns the starting pair $(i, j)$ to construct the microstate. Here, RMACore defines whether it is running on the CPU or GPU, rng is a random number generator, and m is a counter of microstates.
  5. Add a docstring to your sampling mode describing its behavior and initialization.
  6. Add your sampling mode to docs/src/api.md.
  7. Add the expression to the SamplingMode docstring.
  8. Add tests to test/distributions.jl and test/sampling.jl.

Adding a new Microstate Shape

Steps

  1. Define your microstate design. It essentially determines the microstate structure and reading order. For example, square microstates are read row-wise, while triangular microstates may be read column-wise. Each position in the microstate structure must be associated with a power of two in order to convert the binary microstate into a decimal index.
  2. Define a new struct that is a subtype of MicrostateShape.
  3. Implement the dispatch get_histogram_size(::MyShape), which returns the histogram length.
  4. Implement the dispatch get_power_vector(::RMACore, ::MyShape), which returns the power vector used to read the microstate as an integer.
  5. Implement the dispatch get_offsets(::RMACore, ::MyShape), which returns which positions are accessed from $(i, j)$ to construct the microstate.
  6. Define how your shape reacts to a SamplingSpace by implementing SamplingSpace(::MyType, x, y).
  7. Add a docstring to your microstate shape describing its behavior and initialization.
  8. Add your microstate shape to docs/src/api.md.
  9. Add the expression to the MicrostateShape docstring.
  10. Add tests to test/distributions.jl and test/shapes.jl.

Example

As an example, let's try to construct the struct to obtain the microstate:

\[\begin{matrix} r_{(i,j)} & & r_{(i, j+2)} \\ & r_{(i+1, j+1)} & \\ r_{(i+2,j)} & & r_{(i+2, j+2)} \end{matrix}\]

For simplicity, we are not going to generalize it for arbitrary sizes 😅. First, let's define our struct:

struct MyMicrostateShape <: MicrostateShape end

Next, we need to define the histogram size, which will be returned by get_histogram_size. It is $2^\sigma$, where $\sigma$ is the number of recurrences contained within the microstate shape. For a square microstate we have $\sigma = N^2$, and for a triangular microstate $\sigma = N(N - 1)\div 2$. In our example, it is simple since our shape is fixed: $\sigma = 5$, so:

RecurrenceMicrostatesAnalysis.get_histogram_size(::MyMicrostateShape) = 2^5

The next step is to define our power vector. It determines how we read our microstate as an integer. Each position should be associated with a power of 2:

\[\begin{matrix} 2^0r_{(i,j)} & & 2^1r_{(i, j+2)} \\ & 2^2r_{(i+1, j+1)} & \\ 2^3r_{(i+2,j)} & & 2^4r_{(i+2, j+2)} \end{matrix}\]

Then, we write:

RecurrenceMicrostatesAnalysis.get_power_vector(::RecurrenceMicrostatesAnalysis.CPUCore, ::MyMicrostateShape) = SVector{5, Int}([1, 2, 4, 8, 16])

Finally, we need to define the set of offsets used to construct the microstate from the trajectory. Note that each offset must have the same index as the corresponding element in the power vector.

function RecurrenceMicrostatesAnalysis.get_offsets(::RecurrenceMicrostatesAnalysis.CPUCore, ::MyMicrostateShape)
    elems = [
        SVector{2, Int}([0, 0]),
        SVector{2, Int}([0, 2]),
        SVector{2, Int}([1, 1]),
        SVector{2, Int}([2, 0]),
        SVector{2, Int}([2, 2])
    ]

    return SVector{5, SVector{2, Int}}(elems)
end

Now, we just need to define how our microstate will behave with respect to a sampling space. It is not necessary to define it for all available sampling spaces, but you need to do so for at least one of them.

RecurrenceMicrostatesAnalysis.SamplingSpace(
    ::MyMicrostateShape,
    x::StateSpaceSet,
    y::StateSpaceSet
) = RecurrenceMicrostatesAnalysis.SSRect2(length(x) - 2, length(y) - 2)

And that's it, we can now use our new microstate shape 🙂 (and why not combine it with the previous recurrence expression?!)

rmspace = RecurrenceMicrostates(MyRecurrenceExpr(0.27), MyMicrostateShape())
RecurrenceMicrostates, with 3 fields:
 shape = Main.MyMicrostateShape()
 expr = Main.MyRecurrenceExpr{Float64, Euclidean}(0.27, Euclidean(0.0))
 sampling = SRandom{Float64}(0.05)
probabilities(rmspace, X)
 Probabilities{Float64,1} over 32 outcomes
  1  0.00046183811570048793
  2  0.00016063934459147408
  3  0.0002811188530350796
  4  0.0028915082026465333
  5  0.0023694303327242423
  6  0.0011847151663621212
  7  0.0010039959036967129
  8  0.015762735688038394
  9  0.00036143852533081667
 10  0.0027308688580550593
  ⋮  
 24  0.07509889359651413
 25  0.002871428284572599
 26  0.012570028714282845
 27  0.012650348386578582
 28  0.07891407803056164
 29  0.016184413967591012
 30  0.07542017228569707
 31  0.0753197726953274
 32  0.45213951527077767

Adding a new quantity estimator

Since RecurrenceMicrostatesAnalysis.jl uses the same structure as ComplexityMeasures.jl to estimate or measure complexity values (e.g., determinism, disorder, etc.), the method to implement new features is very similar.

To add new quantity estimators, refer to the ComplexityMeasures.jl Dev Docs.

Tests for new quantifiers implemented on RecurrenceMicrostatesAnalysis.jl need to be add to test/rqa.jl.

Adding a new GPU metric

Due to the incompatibility of Distances.jl with GPUs, it may be necessary to redefine some metrics to use them with the RecurrenceMicrostatesAnalysis.jl GPU backend.

Steps

  1. Define a new type that is a subtype of GPUMetric.
  2. Implement the dispatch gpu_evaluate(::YourMetric, x, y, i, j, n). Here, i and j indicate which positions of the AbstractGPUVector are accessed (i for x, and j for y), and n is the number of dimensions of the system.
  3. Add a docstring to your metric describing it.
  4. Add your metric to docs/src/api.md.
  5. Add the expression to the GPUMetric docstring.
  6. Add tests to test/utils.jl.