Fuzzing with Quantitative and Adaptive Hot-Bytes Identification

Tai D. Nguyen,Long H. Pham,Jun Sun
2023-07-05
Abstract:Fuzzing has emerged as a powerful technique for finding security bugs in complicated real-world applications. American fuzzy lop (AFL), a leading fuzzing tool, has demonstrated its powerful bug finding ability through a vast number of reported CVEs. However, its random mutation strategy is unable to generate test inputs that satisfy complicated branching conditions (e.g., magic-byte comparisons, checksum tests, and nested if-statements), which are commonly used in image decoders/encoders, XML parsers, and checksum tools. Existing approaches (such as Steelix and Neuzz) on addressing this problem assume unrealistic assumptions such as we can satisfy the branch condition byte-to-byte or we can identify and focus on the important bytes in the input (called hot-bytes) once and for all. In this work, we propose an approach called \tool~which is designed based on the following principles. First, there is a complicated relation between inputs and branching conditions and thus we need not only an expressive model to capture such relationship but also an informative measure so that we can learn such relationship effectively. Second, different branching conditions demand different hot-bytes and we must adjust our fuzzing strategy adaptively depending on which branches are the current bottleneck. We implement our approach as an open source project and compare its efficiency with other state-of-the-art fuzzers. Our evaluation results on 10 real-world programs and LAVA-M dataset show that \tool~achieves sustained increases in branch coverage and discovers more bugs than other fuzzers.
Cryptography and Security,Software Engineering
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve The paper aims to address a critical issue in fuzzing: how to effectively generate test inputs to cover complex branch conditions, thereby uncovering more security vulnerabilities. Specifically: 1. **Complex Branch Conditions**: Existing fuzzing tools (such as AFL) struggle to generate test inputs that satisfy complex branch conditions (such as magic number comparisons, checksum tests, and nested if statements) due to the limitations of their random mutation strategies. This results in many potential security vulnerabilities remaining undiscovered. 2. **Improvement Strategy**: The paper proposes a new method called Finch, which improves fuzzing through the following two main aspects: - Using expressive neural network models to identify "hot-bytes," which are bytes crucial for triggering specific branch conditions. - Dynamically adjusting the fuzzing strategy by selecting different hot-bytes for mutation based on the current bottleneck branches, thereby gradually narrowing the gap between the test inputs and the target branches. 3. **Experimental Results**: Through evaluations on 10 real-world programs and the LAVA-M dataset, Finch outperforms other state-of-the-art fuzzing tools in terms of branch coverage and discovering more unique crashes. In summary, the main objective of the paper is to enhance the effectiveness and efficiency of fuzzing tools in handling complex branch conditions by introducing a quantitative and adaptive hot-byte identification mechanism.