What is it about?

We used an approach from the area of privacy-preserving record linkage to encode training data samples for the machine learning-based detection of algorithmically generated domain names, which are used to enable communication in botnets. The evaluated approach provides the required property of preserving similarity of data samples, while at the same time allowing to tune encodings in regard to the privacy-utility trade-off. We discuss requirements of different machine learning scenarios as well as privacy implications of this encoding approach for those scenarios. We further evaluated the encoding approach by training deep learning models on encodings generated with different parameter values, and compare their performance to the model trained on cleartext samples.

Featured Image

Why is it important?

For many applications related to classification, machine learning has become the go-to solution. Its use in scenarios involving sensitive training data and the rise of privacy regulations such as the GDPR, however, have led to concerns about potential leakage of sensitive information. We contributed to the goal of improving the understanding of privacy approaches for machine learning by evaluating an approach from the area of privacy-preserving record linkage in the cybersecurity use case of detecting algorithmically generated domains via deep learning. We hope that building bridges between these research areas helps to find innovative solutions for technical privacy protection.

Read the Original

This page is a summary of: DGA Detection Using Similarity-Preserving Bloom Encodings, June 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3590777.3590795.
You can read the full text:

Read

Contributors

The following have contributed to this page