What is it about?

This paper examines the effect of representation-level augmentation methods such as mixup and it's variants for software vulnerability detection, and also proposes a masked variant to increase effectiveness and reduce important information loss. It shows that using such methods are not as effective as simply using random oversampling of the vulnerable samples, but it does provide sota performance as complex generative methods.

Featured Image

Why is it important?

It shows that generative methods do not necessarily beat more basic ways of creating new data points, and also random oversampling may be more useful when it comes to dealing with the shortage of vulnerable samples.

Perspectives

Interesting finding, which complements the results of VulScriber paper. It shows that generative models (Non-LLMs) are not as effective for dealing with the data shortage problem that exists in vulnerability datasets.

Seyed Shayan Daneshvar
University of Manitoba

Read the Original

This page is a summary of: A Study on Mixup-Inspired Augmentation Methods for Software Vulnerability Detection, June 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3756681.3757017.
You can read the full text:

Read

Contributors

The following have contributed to this page