What is it about?
The paper explores a new way to defend machine-learning models against so-called “black-box” adversarial attacks, where an attacker only sees your model’s outputs and iteratively tweaks inputs to fool it. Rather than trying to clean or obscure incoming data, the defense generates a “counter-sample” for every query: it takes the attacker’s input and nudges that input back toward its correct label before passing it through. By always returning these optimized counter-samples instead of the original queries, the attacker’s search direction becomes misleading, their estimated gradients no longer point toward a successful adversarial example, and the attack tends to fail - while the model’s accuracy on genuine inputs remains intact. It’s essentially about introducing an asymmetry in query-based attacks by weaponizing the defender’s ability to do fast, targeted optimizations that the attacker, limited to black-box probing, cannot counter.
Featured Image
Photo by Tamara Gak on Unsplash
Read the Original
This page is a summary of: Counter-Samples: A Stateless Strategy to Neutralize Black-Box Adversarial Attacks, ACM Transactions on Intelligent Systems and Technology, August 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3744657.
You can read the full text:
Contributors
The following have contributed to this page







