What is it about?
This paper examines the effect of representation-level augmentation methods such as mixup and it's variants for software vulnerability detection, and also proposes a masked variant to increase effectiveness and reduce important information loss. It shows that using such methods are not as effective as simply using random oversampling of the vulnerable samples, but it does provide sota performance as complex generative methods.
Featured Image
Photo by Nick Brunner on Unsplash
Why is it important?
It shows that generative methods do not necessarily beat more basic ways of creating new data points, and also random oversampling may be more useful when it comes to dealing with the shortage of vulnerable samples.
Perspectives
Interesting finding, which complements the results of VulScriber paper. It shows that generative models (Non-LLMs) are not as effective for dealing with the data shortage problem that exists in vulnerability datasets.
Seyed Shayan Daneshvar
University of Manitoba
Read the Original
This page is a summary of: A Study on Mixup-Inspired Augmentation Methods for Software Vulnerability Detection, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3756681.3757017.
You can read the full text:
Contributors
The following have contributed to this page







