API
Types
StreamSampling.ReservoirSampler — Type
ReservoirSampler{T}([rng], method = AlgRSWRSKIP())
ReservoirSampler{T}([rng], n::Int, method = AlgL(); ordered = false)Initializes a reservoir sampler with elements of type T.
The first signature represents a sample where only a single element is collected. If ordered is true, the sampled values can be retrived in the order they were collected using ordvalue.
Look at the Sampling Algorithms section for the supported methods.
StreamSampling.SequentialSampler — Type
SequentialSampler{T}([rng], iter, n, [N], method = AlgD())Initializes a sequential sampler, which can then be iterated over to return the sampling elements of the iterable iter which is assumed to have a eltype of T. The methods implemented in SequentialSampler require the knowledge of the total number of elements in the stream N, if not provided it is assumed to be available by calling length(iter).
SequentialSampler{T}([rng], iter, wfunc, n, W, method = AlgORDWSWR())Initializes a weigthed sequential sampler, which can then be iterated over to return the sampling elements of the iterable iter which is assumed to have a eltype of T. The methods implemented in SequentialSampler for weighted streams require the knowledge of the total weight of the stream W and a weight function wfunc specifying how to map an element to its weight.
SequentialSampler([rng], n::Integer, N::Integer, method = AlgD())Initializes a sequential sampler, which can then be iterated over to return n ordered indices between 1 and N, respecting the sampling scheme of the selected method, which can be AlgD(), AlgHiddenShuffle() or AlgORDSWR().
Methods
StatsAPI.fit! — Function
fit!(rs::AbstractReservoirSampler, el)
fit!(rs::AbstractReservoirSampler, el, w)Updates the reservoir sample by taking into account the element passed. If the sampling is weighted also the weight of the elements needs to be passed.
Base.merge! — Function
Base.merge!(rs::AbstractReservoirSampler...)Updates the first reservoir sampler by merging its value with the values of the other samplers. The number of elements after merging will be the minimum number of elements in the merged reservoirs.
Base.merge — Function
Base.merge(rs::AbstractReservoirSampler...)Creates a new reservoir sampler by merging the values of the samplers passed. The number of elements in the new sampler will be the minimum number of elements in the merged reservoirs.
StreamSampling.combine — Function
combine([rng], samples::AbstractArray, weights::AbstractArray)Combines different sequential samples in a single one. The number of elements in the new sampler will be the minimum number of elements in the samples. weights should contain the total weight of each stream, which in the unweighted case coincides with the length of the streams.
Base.empty! — Function
Base.empty!(rs::AbstractReservoirSampler)Resets the reservoir sample to its initial state. Useful to avoid allocating a new sampler in some cases.
OnlineStatsBase.value — Function
value(rs::AbstractReservoirSampler)Returns the elements collected in the sample at the current sampling stage.
If the sampler is empty, it returns nothing for single element sampling. For multi-valued samplers, it always returns the sample collected so far instead.
Note that even if the sampling respects the schema it is assigned when ReservoirSampler is instantiated, some ordering in the sample can be more probable than others. To represent each one with the same probability call fshuffle! on the result.
StreamSampling.ordvalue — Function
ordvalue(rs::AbstractReservoirSampler)Returns the elements collected in the sample at the current sampling stage in the order they were collected. This applies only when ordered = true is passed in ReservoirSampler.
If the sampler is empty, it returns nothing for single element sampling. For multi-valued samplers, it always returns the sample collected so far instead.
StatsAPI.nobs — Function
nobs(rs::AbstractReservoirSampler)Returns the total number of elements that have been observed so far during the sampling process.
StreamSampling.itsample — Function
itsample([rng], iter, method = AlgRSWRSKIP())
itsample([rng], iter, wfunc, method = AlgWRSWRSKIP())Return a random element of the iterator, optionally specifying a rng (which defaults to Random.default_rng()) and a function wfunc which accept each element as input and outputs the corresponding weight. If the iterator is empty, it returns nothing.
itsample([rng], iter, n::Int, method = AlgL(); ordered = false)
itsample([rng], iter, wfunc, n::Int, method = AlgAExpJ(); ordered = false)Return a vector of n random elements of the iterator, optionally specifying a rng (which defaults to Random.default_rng()), a weight function wfunc specifying how to map an element to its weight and a method. ordered dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in iter) must be collected.
If the iterator has less than n elements, in the case of sampling without replacement, it returns a vector of those elements.
Algorithms
StreamSampling.AlgR — Type
Implements random reservoir sampling without replacement. To be used with ReservoirSampler or itsample.
Adapted from algorithm R described in "Random sampling with a reservoir, J. S. Vitter, 1985".
StreamSampling.AlgL — Type
Implements random reservoir sampling without replacement. To be used with ReservoirSampler or itsample.
Adapted from algorithm L described in "Random sampling with a reservoir, J. S. Vitter, 1985".
StreamSampling.AlgRSWRSKIP — Type
Implements random reservoir sampling with replacement. To be used with ReservoirSampler or itsample.
Adapted fron algorithm RSWR-SKIP described in "Reservoir-based Random Sampling with Replacement from Data Stream, B. Park et al., 2008".
StreamSampling.AlgARes — Type
Implements weighted random reservoir sampling without replacement. To be used with ReservoirSampler or itsample.
Adapted from algorithm A-Res described in "Weighted random sampling with a reservoir, P. S. Efraimidis, P. G. Spirakis, 2006".
StreamSampling.AlgAExpJ — Type
Implements weighted random reservoir sampling without replacement. To be used with ReservoirSampler or itsample.
Adapted from algorithm A-ExpJ described in "Weighted random sampling with a reservoir, P. S. Efraimidis P. G. Spirakis, 2006".
StreamSampling.AlgWRSWRSKIP — Type
Implements weighted random reservoir sampling with replacement. To be used with ReservoirSampler or itsample.
Adapted from algorithm WRSWR-SKIP described in "Investigating Methods for Weighted Reservoir Sampling with Replacement, A. Meligrana, 2024".
StreamSampling.AlgD — Type
Implements random stream sampling without replacement. To be used with SequentialSampler or itsample.
Adapted from algorithm D described in "An Efficient Algorithm for Sequential Random Sampling, J. S. Vitter, 1987".
StreamSampling.AlgHiddenShuffle — Type
Implements random stream sampling without replacement. To be used with SequentialSampler or itsample.
Adapted from algorithm HiddenShuffle described in "Sequential Random Sampling Revisited: Hidden Shuffle Method, M. Shekelyan, G. Cormode, 2021".
StreamSampling.AlgORDSWR — Type
Implements random stream sampling with replacement. To be used with SequentialSampler or itsample.
Adapted from algorithm 4 described in "Generating Sorted Lists of Random Numbers, J. L. Bentley, J. B. Saxe, 1980".
StreamSampling.AlgORDWSWR — Type
Implements weighted random stream sampling with replacement. To be used with SequentialSampler.
Adapted from algorithm 3 described in "An asymptotically optimal, online algorithm for weighted random sampling with replacement, M. Startek, 2016".