What is it about?
Reservoir sampling is an innovative method to perform elemental sampling from distributions whereabouts elements are placed within a reservoir with a certain probability. In most cases, this probability is constant and therefore, some elements might replace others that pre-existed in the reservoir. Therefore, the necessity to have a weighted scheme that selects values based on their importance is of note.
Featured Image
Photo by matthew Feeney on Unsplash
Why is it important?
Selecting representative values from data streams can often be challenging as the stream is processed and the time window is expanded. Hence, the weighted scheme presented in this work aims to investigate the importance of selected values compared to others that do not affect the sampling mechanism. Then, clustering occurs to the most appropriate values that reflect the distribution of the elements in order to identify events and outliers. Lastly, the probabilistic stream state graph is constructed as a way for data representation.
Perspectives
IMHO, this paper presents the relevant theoretical and structural components of a weighted framework that can select, cluster and represent significant elements from an unbiased population. I hope this work will help researchers to expand their knowledge on such reservoir-based approaches for future possible research in the field of stream processing.
Christos Karras
University of Patras
Read the Original
This page is a summary of: Weighted Reservoir Sampling On Evolving Streams: A Sampling Algorithmic Framework For Stream Event Identification, September 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3549737.3549767.
You can read the full text:
Contributors
The following have contributed to this page