What is it about?

Reservoir sampling is an innovative method to perform elemental sampling from distributions whereabouts elements are placed within a reservoir with a certain probability. In most cases, this probability is constant and therefore, some elements might replace others that pre-existed in the reservoir. Therefore, the necessity to have a weighted scheme that selects values based on their importance is of note.

Featured Image

Why is it important?

Selecting representative values from data streams can often be challenging as the stream is processed and the time window is expanded. Hence, the weighted scheme presented in this work aims to investigate the importance of selected values compared to others that do not affect the sampling mechanism. Then, clustering occurs to the most appropriate values that reflect the distribution of the elements in order to identify events and outliers. Lastly, the probabilistic stream state graph is constructed as a way for data representation.

Perspectives

IMHO, this paper presents the relevant theoretical and structural components of a weighted framework that can select, cluster and represent significant elements from an unbiased population. I hope this work will help researchers to expand their knowledge on such reservoir-based approaches for future possible research in the field of stream processing.

Christos Karras
University of Patras

Read the Original

This page is a summary of: Weighted Reservoir Sampling On Evolving Streams: A Sampling Algorithmic Framework For Stream Event Identification, September 2022, ACM (Association for Computing Machinery),
DOI: 10.1145/3549737.3549767.
You can read the full text:

Read

Contributors

The following have contributed to this page