An Illustrative Example
Suppose to receive data about some process in the form of a stream and you want to detect if anything is going wrong in the data being received. A reservoir sampling approach could be useful to evaluate properties on the data stream. This is a demonstration of such a use case using StreamSampling.jl
. We will assume that the monitored statistic in this case is the mean of the data, and you want that to be lower than a certain threshold otherwise some malfunctioning is expected.
julia> using StreamSampling, Statistics, Random
julia> function monitor(stream, thr)
rng = Xoshiro(42)
# we use a reservoir sample of 10^4 elements
rs = ReservoirSample{Int}(rng, 10^4)
# we loop over the stream and fit the data in the reservoir
for (i, e) in enumerate(stream)
fit!(rs, e)
# we check the mean value every 1000 iterations
if iszero(mod(i, 1000)) && mean(value(rs)) >= thr
return rs
end
end
end
We use some toy data for illustration
julia> stream = 1:10^8; # the data stream
julia> thr = 2*10^7; # the threshold for the mean monitoring
Then, we run the monitoring
julia> rs = monitor(stream, thr);
The number of observations until the detection is triggered is given by
julia> nobs(rs)
40009000
which is very close to the true value of 4*10^7 - 1
observations.
Note that in this case we could use an online mean methods, instead of holding all the sample into memory. However, the approach with the sample is more general because it allows to estimate any statistic about the stream.