API

Types

StreamSampling.ReservoirSamplerType
ReservoirSampler{T}([rng], method = AlgRSWRSKIP())
ReservoirSampler{T}([rng], n::Int, method = AlgL(); ordered = false)

Initializes a reservoir sampler with elements of type T.

The first signature represents a sample where only a single element is collected. If ordered is true, the sampled values can be retrived in the order they were collected using ordvalue.

Look at the Sampling Algorithms section for the supported methods.

source
StreamSampling.StreamSamplerType
StreamSampler{T}([rng], iter, n, [N], method = AlgD())

Initializes a stream sampler, which can then be iterated over to return the sampling elements of the iterable iter which is assumed to have a eltype of T. The methods implemented in StreamSampler require the knowledge of the total number of elements in the stream N, if not provided it is assumed to be available by calling length(iter).

source

Methods

StatsAPI.fit!Function
fit!(rs::AbstractReservoirSampler, el)
fit!(rs::AbstractReservoirSampler, el, w)

Updates the reservoir sample by taking into account the element passed. If the sampling is weighted also the weight of the elements needs to be passed.

source
Base.merge!Function
Base.merge!(rs::AbstractReservoirSampler, rs_others::AbstractReservoirSampler...)

Updates the first reservoir sampler by merging its value with the values of the other samplers. The number of elements after merging will be the minimum number of elements in the merged reservoirs.

source
Base.mergeFunction
Base.merge(rs_all::AbstractReservoirSampler...)

Creates a new reservoir sampler by merging the values of the samplers passed. The number of elements in the new sampler will be the minimum number of elements in the merged reservoirs.

source
Base.empty!Function
Base.empty!(rs::AbstractReservoirSampler)

Resets the reservoir sample to its initial state. Useful to avoid allocating a new sampler in some cases.

source
OnlineStatsBase.valueFunction
value(rs::AbstractReservoirSampler)

Returns the elements collected in the sample at the current sampling stage.

If the sampler is empty, it returns nothing for single element sampling. For multi-valued samplers, it always returns the sample collected so far instead.

Note that even if the sampling respects the schema it is assigned when ReservoirSampler is instantiated, some ordering in the sample can be more probable than others. To represent each one with the same probability call fshuffle! on the result.

source
StreamSampling.ordvalueFunction
ordvalue(rs::AbstractReservoirSampler)

Returns the elements collected in the sample at the current sampling stage in the order they were collected. This applies only when ordered = true is passed in ReservoirSampler.

If the sampler is empty, it returns nothing for single element sampling. For multi-valued samplers, it always returns the sample collected so far instead.

source
StatsAPI.nobsFunction
nobs(rs::AbstractReservoirSampler)

Returns the total number of elements that have been observed so far during the sampling process.

source
StreamSampling.itsampleFunction
itsample([rng], iter, method = AlgRSWRSKIP())
itsample([rng], iter, wfunc, method = AlgWRSWRSKIP())

Return a random element of the iterator, optionally specifying a rng (which defaults to Random.default_rng()) and a function wfunc which accept each element as input and outputs the corresponding weight. If the iterator is empty, it returns nothing.


itsample([rng], iter, n::Int, method = AlgL(); ordered = false)
itsample([rng], iter, wfunc, n::Int, method = AlgAExpJ(); ordered = false)

Return a vector of n random elements of the iterator, optionally specifying a rng (which defaults to Random.default_rng()) a weight function wfunc and a method. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in iter) must be collected.

If the iterator has less than n elements, in the case of sampling without replacement, it returns a vector of those elements.

source

Algorithms

StreamSampling.AlgAResType

Implements weighted random reservoir sampling without replacement. To be used with ReservoirSampler or itsample.

Adapted from algorithm A-Res described in "Weighted random sampling with a reservoir, P. S. Efraimidis et al., 2006".

source
StreamSampling.AlgAExpJType

Implements weighted random reservoir sampling without replacement. To be used with ReservoirSampler or itsample.

Adapted from algorithm A-ExpJ described in "Weighted random sampling with a reservoir, P. S. Efraimidis et al., 2006".

source
StreamSampling.AlgWRSWRSKIPType

Implements weighted random reservoir sampling with replacement. To be used with ReservoirSampler or itsample.

Adapted from algorithm WRSWR-SKIP described in "Investigating Methods for Weighted Reservoir Sampling with Replacement, A. Meligrana, 2024".

source
StreamSampling.AlgDType

Implements random stream sampling without replacement. To be used with StreamSampler or itsample.

Adapted from algorithm D described in "An Efficient Algorithm for Sequential Random Sampling, J. S. Vitter, 1987".

source