API

This is the API page of the package. For a general overview of the functionalities consult the ReadMe.

General Functionalities

StreamSampling.ReservoirSampleType
ReservoirSample{T}([rng], method = AlgRSWRSKIP())
ReservoirSample{T}([rng], n::Int, method = AlgL(); ordered = false)

Initializes a reservoir sample which can then be fitted with fit!. The first signature represents a sample where only a single element is collected. If ordered is true, the reservoir sample values can be retrived in the order they were collected with ordvalue.

Look at the Sampling Algorithms section for the supported methods.

source
StatsAPI.fit!Function
fit!(rs::AbstractReservoirSample, el)
fit!(rs::AbstractReservoirSample, el, w)

Updates the reservoir sample by taking into account the element passed. If the sampling is weighted also the weight of the elements needs to be passed.

source
Base.merge!Function
Base.merge!(rs::AbstractReservoirSample, rs::AbstractReservoirSample...)

Updates the first reservoir sample by merging its value with the values of the other samples. Currently only supported for samples with replacement.

source
Base.mergeFunction
Base.merge(rs::AbstractReservoirSample...)

Creates a new reservoir sample by merging the values of the samples passed. Currently only supported for sample with replacement.

source
Base.empty!Function
Base.empty!(rs::AbstractReservoirSample)

Resets the reservoir sample to its initial state. Useful to avoid allocating a new sample in some cases.

source
OnlineStatsBase.valueFunction
value(rs::AbstractReservoirSample)

Returns the elements collected in the sample at the current sampling stage.

Note that even if the sampling respects the schema it is assigned when ReservoirSample is instantiated, some ordering in the sample can be more probable than others. To represent each one with the same probability call shuffle! over the result.

source
StreamSampling.ordvalueFunction
ordvalue(rs::AbstractReservoirSample)

Returns the elements collected in the sample at the current sampling stage in the order they were collected. This applies only when ordered = true is passed in ReservoirSample.

source
StatsAPI.nobsFunction
nobs(rs::AbstractReservoirSample)

Returns the total number of elements that have been observed so far during the sampling process.

source
StreamSampling.StreamSampleType
StreamSample{T}([rng], iter, n, [N], method = AlgD())

Initializes a stream sample, which can then be iterated over to return the sampling elements of the iterable iter which is assumed to have a eltype of T. The methods implemented in StreamSample require the knowledge of the total number of elements in the stream N, if not provided it is assumed to be available by calling length(iter).

source
StreamSampling.itsampleFunction
itsample([rng], iter, method = AlgRSWRSKIP())
itsample([rng], iter, wfunc, method = AlgWRSWRSKIP())

Return a random element of the iterator, optionally specifying a rng (which defaults to Random.default_rng()) and a function wfunc which accept each element as input and outputs the corresponding weight. If the iterator is empty, it returns nothing.


itsample([rng], iter, n::Int, method = AlgL(); ordered = false)
itsample([rng], iter, wfunc, n::Int, method = AlgAExpJ(); ordered = false)

Return a vector of n random elements of the iterator, optionally specifying a rng (which defaults to Random.default_rng()) a weight function wfunc and a method. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in iter) must be collected.

If the iterator has less than n elements, in the case of sampling without replacement, it returns a vector of those elements.


itsample(rngs, iters, n::Int)
itsample(rngs, iters, wfuncs, n::Int)

Parallel implementation which returns a sample with replacement of size n from the multiple iterables. All the arguments except from n must be tuples.

source

Sampling Algorithms

StreamSampling.AlgRSWRSKIPType

Implements random reservoir sampling with replacement. To be used with ReservoirSample or itsample.

Adapted fron algorithm RSWR-SKIP described in "Reservoir-based Random Sampling with Replacement from Data Stream, B. Park et al., 2008".

source
StreamSampling.AlgAResType

Implements weighted random reservoir sampling without replacement. To be used with ReservoirSample or itsample.

Adapted from algorithm A-Res described in "Weighted random sampling with a reservoir, P. S. Efraimidis et al., 2006".

source
StreamSampling.AlgAExpJType

Implements weighted random reservoir sampling without replacement. To be used with ReservoirSample or itsample.

Adapted from algorithm A-ExpJ described in "Weighted random sampling with a reservoir, P. S. Efraimidis et al., 2006".

source
StreamSampling.AlgWRSWRSKIPType

Implements weighted random reservoir sampling with replacement. To be used with ReservoirSample or itsample.

Adapted from algorithm WRSWR-SKIP described in "Weighted Reservoir Sampling with Replacement from Multiple Data Streams, A. Meligrana, 2024".

source
StreamSampling.AlgDType

Implements random sampling without replacement. To be used with StreamSample or itsample.

Adapted from algorithm D described in "An Efficient Algorithm for Sequential Random Sampling, J. S. Vitter, 1987".

source