Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning

Antonio Emanuele Cinà; Kathrin Grosse; Ambra Demontis; Sebastiano Vascon; Werner Zellinger; Bernhard A. Moser; Alina Oprea; Battista Biggio; Marcello Pelillo; Fabio Roli

doi:10.1145/3585385

What is it about?

The unprecedented success of machine learning (ML) in many diverse applications has been inherently dependent on the increasing availability of computing power and large training datasets, under the implicit assumption that such datasets are well representative of the data that will be encountered at test time. However, this assumption may be violated in the presence of data poisoning attacks, i.e., if attackers can either compromise the training data, or gain some control over the learning process (e.g., when model training is outsourced to an untrusted third-party service). Although poisoning has been acknowledged as a relevant threat in industry applications, and a variety of different attacks and defenses have been proposed so far, a complete systematization and critical review of the field is still missing. In this survey, we provide a comprehensive systematization of poisoning attacks and defenses in machine learning, reviewing more than 100 papers published in the field in the last 15 years. We start our review with a detailed discussion on threat modeling for poisoning attacks, and on the underlying assumptions needed to defend against them. This includes defining the learning settings where data poisoning attacks (and defenses) are possible. We further highlight the different attack strategies that give us a scaffold for a detailed overview of data poisoning attacks. We then give an overview of the main defense mechanisms proposed to date, including training-time and test-time defense strategies, and we match them with the corresponding poisoning attacks they prevent. We discuss poisoning research resources such as libraries and datasets containing poisoned models. Finally, we review the historical development of poisoning attacks and defenses. This overview serves as a basis for discussing ongoing challenges in the field, such as limitations of current threat models, the design of more scalable attacks, and the arms race toward designing more comprehensive and effective defenses. For each of these points, we discuss open questions and related future work.

Photo by Markus Spiske on Unsplash

Why is it important?

Kumar et al. (2020), data poisoning is considered the most feared threat from companies working with machine learning and is therefore attracting increasing attention in the literature. We, however, argue that the literature around poisoning is often quite chaotic. The distinction between the different poisoning attack types is unclear, leading to unfair comparisons in the experimental evaluation. In some works, for example, we can see comparisons between attacks with essentially different goals (e.g., targeted compared to backdoor attacks) and assumptions (e.g., attacks manipulating a few data vs. attacks on the entire training set) that require different evaluations. Moreover, perhaps due to the chaotic scenario, some works are introducing methodological novelty in staging poisoning attacks, but their applicability remains unknown. Proper categorization of existing attacks based on their threat models is therefore demanding for shedding light on that field to overcome the difficulties mentioned above.

Perspectives

We believe that shedding light on poisoning attacks and defenses remains extremely important to enable the development of reliable machine learning applications.
Antonio Cinà
CISPA - Helmholtz center for information security

This page is a summary of: Wild Patterns Reloaded: A Survey of Machine Learning Security against Training Data Poisoning, ACM Computing Surveys, July 2023, ACM (Association for Computing Machinery),
DOI: 10.1145/3585385.
You can read the full text:

Read

Contributors

The following have contributed to this page

A Survey on poisoning attacks and defenses against machine learning.

What is it about?

Why is it important?

Perspectives

Contributors

You might also like

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

A Survey on poisoning attacks and defenses against machine learning.

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

You might also like

Becoming an information architect: The evolving librarian’s skillset, mindset, and professional identity

Aligned but not integrated: UK academic library support to mental health and well-being during COVID-19

Sustaining Scholarly Publishing: New Business Models for University Presses: A Report of the AAUP Task Force on Economic Models for Scholarly Publishing

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management