Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch

Christian Janos Lebeda; Jakub Tetek

doi:10.1145/3665252.3665255

What is it about?

Suppose we get a long sequence of items – this can be anything from web page requests or sensor readings, to log entries of some software. One of the most common questions in this situation is “Which items are the most frequent”. But what if the sequence is so long that we do not want to store all of it, but we can only access the items one by one? Ways of answering this question were known, but they could potentially violate the users’ privacy. There also existed privacy-preserving ways to do this, but they required storing the whole sequence. In this paper, we provide an algorithm that uses a small amount of space, while also protecting users’ privacy (in the sense of differential privacy).

Photo by Campaign Creators on Unsplash

Why is it important?

Our algorithm can efficiently answer one of the most common questions people have about data streams while preserving users’ privacy.

Perspectives

We are hoping that this algorithm will be of use to many data analysts. We hope it will enable them to perform the needed data analysis while ensuring that their users’ privacy is protected.
Jakub Tetek
BARC, University of Copenhagen

This page is a summary of: Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch, ACM SIGMOD Record, May 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3665252.3665255.
You can read the full text:

Read

Resources

Open Access version
Better Differentially Private Approximate Histograms and Heavy Hitters using the Misra-Gries Sketch
Open access version of our paper on arXiv.

Contributors

The following have contributed to this page

Jakub Tetek
BARC, University of Copenhagen

Finding frequent items using a small amount of memory, while preserving privacy

What is it about?

Why is it important?

Perspectives

Resources