API
Types
StreamSampling.ReservoirSampler
— TypeReservoirSampler{T}([rng], method = AlgRSWRSKIP())
ReservoirSampler{T}([rng], n::Int, method = AlgL(); ordered = false)
Initializes a reservoir sampler with elements of type T
.
The first signature represents a sample where only a single element is collected. If ordered
is true, the sampled values can be retrived in the order they were collected using ordvalue
.
Look at the Sampling Algorithms
section for the supported methods.
StreamSampling.StreamSampler
— TypeStreamSampler{T}([rng], iter, n, [N], method = AlgD())
Initializes a stream sampler, which can then be iterated over to return the sampling elements of the iterable iter
which is assumed to have a eltype
of T
. The methods implemented in StreamSampler
require the knowledge of the total number of elements in the stream N
, if not provided it is assumed to be available by calling length(iter)
.
Methods
StatsAPI.fit!
— Functionfit!(rs::AbstractReservoirSampler, el)
fit!(rs::AbstractReservoirSampler, el, w)
Updates the reservoir sample by taking into account the element passed. If the sampling is weighted also the weight of the elements needs to be passed.
Base.merge!
— FunctionBase.merge!(rs::AbstractReservoirSampler, rs_others::AbstractReservoirSampler...)
Updates the first reservoir sampler by merging its value with the values of the other samplers. The number of elements after merging will be the minimum number of elements in the merged reservoirs.
Base.merge
— FunctionBase.merge(rs_all::AbstractReservoirSampler...)
Creates a new reservoir sampler by merging the values of the samplers passed. The number of elements in the new sampler will be the minimum number of elements in the merged reservoirs.
Base.empty!
— FunctionBase.empty!(rs::AbstractReservoirSampler)
Resets the reservoir sample to its initial state. Useful to avoid allocating a new sampler in some cases.
OnlineStatsBase.value
— Functionvalue(rs::AbstractReservoirSampler)
Returns the elements collected in the sample at the current sampling stage.
If the sampler is empty, it returns nothing
for single element sampling. For multi-valued samplers, it always returns the sample collected so far instead.
Note that even if the sampling respects the schema it is assigned when ReservoirSampler
is instantiated, some ordering in the sample can be more probable than others. To represent each one with the same probability call fshuffle!
on the result.
StreamSampling.ordvalue
— Functionordvalue(rs::AbstractReservoirSampler)
Returns the elements collected in the sample at the current sampling stage in the order they were collected. This applies only when ordered = true
is passed in ReservoirSampler
.
If the sampler is empty, it returns nothing
for single element sampling. For multi-valued samplers, it always returns the sample collected so far instead.
StatsAPI.nobs
— Functionnobs(rs::AbstractReservoirSampler)
Returns the total number of elements that have been observed so far during the sampling process.
StreamSampling.itsample
— Functionitsample([rng], iter, method = AlgRSWRSKIP())
itsample([rng], iter, wfunc, method = AlgWRSWRSKIP())
Return a random element of the iterator, optionally specifying a rng
(which defaults to Random.default_rng()
) and a function wfunc
which accept each element as input and outputs the corresponding weight. If the iterator is empty, it returns nothing
.
itsample([rng], iter, n::Int, method = AlgL(); ordered = false)
itsample([rng], iter, wfunc, n::Int, method = AlgAExpJ(); ordered = false)
Return a vector of n
random elements of the iterator, optionally specifying a rng
(which defaults to Random.default_rng()
) a weight function wfunc
and a method
. ordered
dictates whether an ordered sample (also called a sequential sample, i.e. a sample where items appear in the same order as in iter
) must be collected.
If the iterator has less than n
elements, in the case of sampling without replacement, it returns a vector of those elements.
Algorithms
StreamSampling.AlgR
— TypeImplements random reservoir sampling without replacement. To be used with ReservoirSampler
or itsample
.
Adapted from algorithm R described in "Random sampling with a reservoir, J. S. Vitter, 1985".
StreamSampling.AlgL
— TypeImplements random reservoir sampling without replacement. To be used with ReservoirSampler
or itsample
.
Adapted from algorithm L described in "Random sampling with a reservoir, J. S. Vitter, 1985".
StreamSampling.AlgRSWRSKIP
— TypeImplements random reservoir sampling with replacement. To be used with ReservoirSampler
or itsample
.
Adapted fron algorithm RSWR-SKIP described in "Reservoir-based Random Sampling with Replacement from Data Stream, B. Park et al., 2008".
StreamSampling.AlgARes
— TypeImplements weighted random reservoir sampling without replacement. To be used with ReservoirSampler
or itsample
.
Adapted from algorithm A-Res described in "Weighted random sampling with a reservoir, P. S. Efraimidis et al., 2006".
StreamSampling.AlgAExpJ
— TypeImplements weighted random reservoir sampling without replacement. To be used with ReservoirSampler
or itsample
.
Adapted from algorithm A-ExpJ described in "Weighted random sampling with a reservoir, P. S. Efraimidis et al., 2006".
StreamSampling.AlgWRSWRSKIP
— TypeImplements weighted random reservoir sampling with replacement. To be used with ReservoirSampler
or itsample
.
Adapted from algorithm WRSWR-SKIP described in "Investigating Methods for Weighted Reservoir Sampling with Replacement, A. Meligrana, 2024".
StreamSampling.AlgD
— TypeImplements random stream sampling without replacement. To be used with StreamSampler
or itsample
.
Adapted from algorithm D described in "An Efficient Algorithm for Sequential Random Sampling, J. S. Vitter, 1987".
StreamSampling.AlgORDSWR
— TypeImplements random stream sampling with replacement. To be used with StreamSampler
or itsample
.
Adapted from algorithm 4 described in "Generating Sorted Lists of Random Numbers, J. L. Bentley et al., 1980".