What is it about?

Image classifiers often fail when exposed to small, adversarial changes in input data. Defenses are usually tested by measuring performance on perturbed test images within a set limit. However, these tests often focus on specific attack types and overlook the flexibility of real-world attackers who may adapt or combine methods. In this study, we show that even models deemed robust to strong attacks like AutoAttack can be fooled by a slight variation of the simpler FGSM attack, where the perturbation is transformed before being applied. Our results highlight that current defenses may not generalize well beyond narrow threat models and may overstate their robustness.

Featured Image

Why is it important?

This work is important because it reveals a gap between how we currently test model robustness and how attacks may happen in the real world. Many defenses seem effective because they’re tested against fixed, well-known attacks. But real attackers can adapt and modify their methods in ways that aren't covered by these standard tests. This study shows that even strong models can be fooled by small changes to simple attacks, suggesting that their robustness may not hold up outside controlled settings. As a result, current defenses might give a false sense of security. Highlighting these weaknesses is a step toward developing models that are robust against a broader range of threats, making AI systems more reliable and secure in practice.

Perspectives

There's a need for adaptive and transformation-aware evaluations, inspired by real-world image distortions. Furthermore, given the promising performance of diffusion-based purification techniques, developing faster and more efficient purification methods is essential to enable their use in practical, real-time settings.

Fatemeh Amerehi
University of Limerick

Read the Original

This page is a summary of: Robust Image Classifiers Fail Under Shifted Adversarial Perturbations, August 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3704268.3742694.
You can read the full text:

Read

Contributors

The following have contributed to this page