Introduction
StreamSampling
— ModuleStreamSampling.jl
The scope of this package is to provide general methods to sample from any stream in a single pass through the data, even when the number of items contained in the stream is unknown.
This has some advantages over other sampling procedures:
- If the iterable is lazy, the memory required is a small constant or grows in relation to the size of the sample, instead of all the population.
- With reservoir methods, the sample collected is a random sample of the portion of the stream seen thus far at any point of the sampling process.
- In some cases, sampling with the techniques implemented in this library can bring considerable performance gains, since the population of items doesn't need to be previously stored in memory.
For information about the available functionalities consult the documentation.
Contributing
Contributions are welcome! If you encounter any issues, have suggestions for improvements, or would like to add new features, feel free to open an issue or submit a pull request.
Installation
using Pkg
Pkg.add("StreamSampling")
Reproducibility
The documentation of StreamSampling.jl was built using these direct dependencies,
Status `~/work/StreamSampling.jl/StreamSampling.jl/docs/Project.toml`
[6e4b80f9] BenchmarkTools v1.6.0
[e30172f5] Documenter v1.14.1
[ff63dad9] StreamSampling v0.7.4 `~/work/StreamSampling.jl/StreamSampling.jl`
[9a3f8284] Random v1.11.0
and using this machine and Julia version.
Julia Version 1.11.6
Commit 9615af0f269 (2025-07-09 12:58 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 4 × AMD EPYC 7763 64-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 2 default, 0 interactive, 1 GC (on 4 virtual cores)
A more complete overview of all dependencies and their versions is also provided.
Status `~/work/StreamSampling.jl/StreamSampling.jl/docs/Manifest.toml`
[a4c015fc] ANSIColoredPrinters v0.0.1
[1520ce14] AbstractTrees v0.4.5
[7d9f7c33] Accessors v0.1.42
[66dad0bd] AliasTables v1.1.3
[6e4b80f9] BenchmarkTools v1.6.0
[944b1d66] CodecZlib v0.7.8
[34da2185] Compat v4.18.0
[a33af91c] CompositionsBase v0.1.2
[187b0558] ConstructionBase v1.6.0
[9a962f9c] DataAPI v1.16.0
⌅ [864edb3b] DataStructures v0.18.22
[31c24e10] Distributions v0.25.120
[ffbed154] DocStringExtensions v0.9.5
[e30172f5] Documenter v1.14.1
[1a297f60] FillArrays v1.13.0
[d7ba0133] Git v1.4.0
[49057fa9] HybridStructs v0.2.1
[34004b35] HypergeometricFunctions v0.3.28
[b5f81e59] IOCapture v0.2.5
[3587e190] InverseFunctions v0.1.17
[92d709cd] IrrationalConstants v0.2.4
[692b3bcd] JLLWrappers v1.7.1
[682c06a0] JSON v0.21.4
[0e77f7df] LazilyInitializedFields v1.3.0
[2ab3a3ac] LogExpFunctions v0.3.29
[1914dd2f] MacroTools v0.5.16
[d0879d2d] MarkdownAST v0.1.2
[e1d29d7a] Missings v1.2.0
[925886fa] OnlineStatsBase v1.7.1
[bac558e1] OrderedCollections v1.8.1
[90014a1f] PDMats v0.11.35
[69de0a69] Parsers v2.8.3
⌅ [aea7be01] PrecompileTools v1.2.1
[21216c6a] Preferences v1.5.0
[43287f4e] PtrArrays v1.3.0
[1fd47b50] QuadGK v2.11.2
[189a3867] Reexport v1.2.2
[2792f1a3] RegistryInstances v0.1.0
[79098fc4] Rmath v0.8.0
[a2af1166] SortingAlgorithms v1.2.2
[276daf66] SpecialFunctions v2.5.1
[10745b16] Statistics v1.11.1
[82ae8749] StatsAPI v1.7.1
[2913bbd2] StatsBase v0.34.6
[4c63d2b9] StatsFuns v1.5.0
[ff63dad9] StreamSampling v0.7.4 `~/work/StreamSampling.jl/StreamSampling.jl`
[3bb67fe8] TranscodingStreams v0.11.3
[2e619515] Expat_jll v2.6.5+0
[f8c6e375] Git_jll v2.50.1+0
[94ce4f54] Libiconv_jll v1.18.0+0
[9bd350c2] OpenSSH_jll v10.0.1+0
[458c3c95] OpenSSL_jll v3.5.2+0
[efe28fd5] OpenSpecFun_jll v0.5.6+0
[f50d1b31] Rmath_jll v0.5.1+0
[0dad84c5] ArgTools v1.1.2
[56f22d72] Artifacts v1.11.0
[2a0f44e3] Base64 v1.11.0
[ade2ca70] Dates v1.11.0
[f43a241f] Downloads v1.6.0
[7b1f6079] FileWatching v1.11.0
[b77e0a4c] InteractiveUtils v1.11.0
[b27032c2] LibCURL v0.6.4
[76f85450] LibGit2 v1.11.0
[8f399da3] Libdl v1.11.0
[37e2e46d] LinearAlgebra v1.11.0
[56ddb016] Logging v1.11.0
[d6f4376e] Markdown v1.11.0
[a63ad114] Mmap v1.11.0
[ca575930] NetworkOptions v1.2.0
[44cfe95a] Pkg v1.11.0
[de0858da] Printf v1.11.0
[9abbd945] Profile v1.11.0
[3fa0cd96] REPL v1.11.0
[9a3f8284] Random v1.11.0
[ea8e919c] SHA v0.7.0
[9e88b42a] Serialization v1.11.0
[6462fe0b] Sockets v1.11.0
[2f01184e] SparseArrays v1.11.0
[f489334b] StyledStrings v1.11.0
[4607b0f0] SuiteSparse
[fa267f1f] TOML v1.0.3
[a4e569a6] Tar v1.10.0
[8dfed614] Test v1.11.0
[cf7118a7] UUIDs v1.11.0
[4ec0a83e] Unicode v1.11.0
[e66e0078] CompilerSupportLibraries_jll v1.1.1+0
[deac9b47] LibCURL_jll v8.6.0+0
[e37daf67] LibGit2_jll v1.7.2+0
[29816b5a] LibSSH2_jll v1.11.0+1
[c8ffd9c3] MbedTLS_jll v2.28.6+0
[14a3606d] MozillaCACerts_jll v2023.12.12
[4536629a] OpenBLAS_jll v0.3.27+1
[05823500] OpenLibm_jll v0.8.5+0
[efcefdf7] PCRE2_jll v10.42.0+1
[bea87d4a] SuiteSparse_jll v7.7.0+0
[83775a58] Zlib_jll v1.2.13+1
[8e850b90] libblastrampoline_jll v5.11.0+0
[8e850ede] nghttp2_jll v1.59.0+0
[3f19e933] p7zip_jll v17.4.0+2
Info Packages marked with ⌅ have new versions available but compatibility constraints restrict them from upgrading. To see why use `status --outdated -m`
You can also download the manifest file and the project file.