What is it about?
Suppose we get a long sequence of items – this can be anything from web page requests or sensor readings, to log entries of some software. One of the most common questions in this situation is “Which items are the most frequent”. But what if the sequence is so long that we do not want to store all of it, but we can only access the items one by one? Ways of answering this question were known, but they could potentially violate the users’ privacy. There also existed privacy-preserving ways to do this, but they required storing the whole sequence. In this paper, we provide an algorithm that uses a small amount of space, while also protecting users’ privacy (in the sense of differential privacy).
Featured Image
Photo by Campaign Creators on Unsplash
Why is it important?
Our algorithm can efficiently answer one of the most common questions people have about data streams while preserving users’ privacy.
Perspectives
We are hoping that this algorithm will be of use to many data analysts. We hope it will enable them to perform the needed data analysis while ensuring that their users’ privacy is protected.
Jakub Tetek
BARC, University of Copenhagen
Read the Original
This page is a summary of: Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch, ACM SIGMOD Record, May 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3665252.3665255.
You can read the full text:
Resources
Contributors
The following have contributed to this page







