VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs

Seyed Shayan Daneshvar; Yu Nong; Xu Yang; Shaowei Wang; Haipeng Cai

doi:10.1145/3760775

What is it about?

Software vulnerabilities are defects in software that can be exploited by hackers and harm security, integrity, or availability of systems. Reseachers have been using Deep Learning based models to detect such vulnerabilities but datasets are small which makes training such models challenging. In our work we use Large language models to create new vulnerable samples by designing novel prompts and using prompt engineering techniques. We design 3 different strategies, where two of them, Injection and Extension, appear to outperform state of the art methods including VGX, Vulgen, and ROS.

Photo by Wesley Ford on Unsplash

Why is it important?

This is the first work in Vulnerability Detection that leverages large language models for augmenting source code. It is also the first paper that shows the effect of adding extra clean samples to datasets, when vulnerable sample are augmented. This is also the first work that used the complex and highly imbalanced PrimeVul dataset and offered a way to improve the performance of models that are trained on it.

Perspectives

This paper emplys RAG in a Zero-shot setting with novel prompts. In fact, this paper uses prompts that have place holders for one or two code pieces that are retrieved by the retriever of the RAG module. With the help of these, a pipeline is made for augmenting software vulnerabilities that have high diversity and good quality such that they can help with performance improvements of deep learning-based vulnerability detection.
Seyed Shayan Daneshvar
University of Manitoba

This page is a summary of: VulScribeR: Exploring RAG-based Vulnerability Augmentation with LLMs, ACM Transactions on Software Engineering and Methodology, August 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3760775.
You can read the full text:

Read

Contributors

The following have contributed to this page

Seyed Shayan Daneshvar
University of Manitoba

Using RAG and prompt engineering to augment software vulnerabilities with Large Language Models

What is it about?

Why is it important?

Perspectives

Contributors

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management

Using RAG and prompt engineering to augment software vulnerabilities with Large Language Models

What is it about?

Featured Image

Why is it important?

Perspectives

Read the Original

Contributors

Share this page:

Discover more

Medical Research

Life Sciences

Physical Sciences

Technology and Engineering

Environmental Research

Arts and Humanities

Social Sciences

Business and Management