What is it about?

Seed scheduling is a critical step in greybox fuzzing that assigns different weights to seed test cases during seed selection, and has a significant impact on the efficiency of fuzzing. Existing seed scheduling strategies rely on manually designed models to estimate the potentials of seeds and determine their weights, which fails to capture the rich information of a seed and its execution, and thus the estimation of seed potentials is not optimal.

Featured Image

Why is it important?

In this paper, we present a new seed scheduling solution, Graphuzz, for coverage-guided greybox fuzzing, which uses deep learning models to estimate the potentials of seeds and works in a data-driven manner. Specifically, we propose an extended control flow graph, called e-CFG, to represent the control flow and data flow characteristics of a seed's execution, which is suitable for graph neural networks (GNNs) to process and estimate seed potentials. We evaluate the code coverage increment of each seed and use it as a label to train the GNN model. Furthermore, we propose a self-attention mechanism to improve the GNN model to capture overlooked features. We have implemented a prototype of Graphuzz based on the baseline fuzzer AFLplusplus. The evaluation results show that our model can estimate the potential of seeds and has the robust ability to generalize to different targets. The evaluation using 12 benchmarks from FuzzBench shows that Graphuzz outperforms AFLplusplus and the state-of-the-art seed scheduling solution K-Scheduler and other coverage-guided fuzzers in terms of code coverage, and the evaluation using 8 benchmarks from Magma shows that Graphuzz outperforms the baseline fuzzer AFLplusplus and SOTA solutions in terms of bug detection.

Perspectives

We conducted this research to explore the role of artificial intelligence, particularly deep learning, in improving greybox fuzzing. Fortunately, we found that neural networks have this capability, and we also believe that research in this area will yield more results.

Hang Xu
Key Laboratory of Cyberspace Security, Ministry of Education, China

Read the Original

This page is a summary of: Graphuzz: Data-driven Seed Scheduling for Coverage-guided Greybox Fuzzing, ACM Transactions on Software Engineering and Methodology, May 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3664603.
You can read the full text:

Read

Contributors

The following have contributed to this page