ProphetFuzz: Fully Automated Prediction and Fuzzing of High-Risk Option Combinations with Only Documentation via Large Language Model

Dawei Wang,Geng Zhou,Li Chen,Dan Li,Yukai Miao
DOI: https://doi.org/10.1145/3658644.3690231
2024-09-02
Abstract:Vulnerabilities related to option combinations pose a significant challenge in software security testing due to their vast search space. Previous research primarily addressed this challenge through mutation or filtering techniques, which inefficiently treated all option combinations as having equal potential for vulnerabilities, thus wasting considerable time on non-vulnerable targets and resulting in low testing efficiency. In this paper, we utilize carefully designed prompt engineering to drive the large language model (LLM) to predict high-risk option combinations (i.e., more likely to contain vulnerabilities) and perform fuzz testing automatically without human intervention. We developed a tool called ProphetFuzz and evaluated it on a dataset comprising 52 programs collected from three related studies. The entire experiment consumed 10.44 CPU years. ProphetFuzz successfully predicted 1748 high-risk option combinations at an average cost of only \$8.69 per program. Results show that after 72 hours of fuzzing, ProphetFuzz discovered 364 unique vulnerabilities associated with 12.30\% of the predicted high-risk option combinations, which was 32.85\% higher than that found by state-of-the-art in the same timeframe. Additionally, using ProphetFuzz, we conducted persistent fuzzing on the latest versions of these programs, uncovering 140 vulnerabilities, with 93 confirmed by developers and 21 awarded CVE numbers.
Cryptography and Security
What problem does this paper attempt to address?
The paper aims to address the issue of vulnerability detection related to option combinations in software security testing. Specifically, due to the vast number of option combinations, traditional methods such as mutation or filtering techniques are inefficient in handling these combinations because they do not distinguish which combinations are more likely to contain vulnerabilities. The paper proposes a method based on large language models (LLM) called ProphetFuzz, which can automatically predict high-risk option combinations and perform fuzz testing. The main objectives of ProphetFuzz are: 1. **Improve testing efficiency**: By predicting high-risk option combinations that are more likely to contain vulnerabilities, it reduces the testing time for non-vulnerable targets. 2. **Automate the testing process**: The entire process requires no manual intervention, from document parsing to generating test commands and files, and then to executing fuzz testing. 3. **Address issues with existing methods**: Overcome problems in traditional methods such as lack of prioritization, semantic mismatches, and high dependency on experts. Through this approach, researchers hope to discover more vulnerabilities during the software testing process and improve overall testing efficiency. Experimental results show that ProphetFuzz found more unique vulnerabilities than existing technologies within the same time frame and at a lower cost.