Gray-Box Fuzzing via Gradient Descent and Boolean Expression Coverage (Technical Report)

Martin Jonáš,Jan Strejček,Marek Trtík,Lukáš Urban
2024-01-23
Abstract:We present a novel gray-box fuzzing algorithm monitoring executions of instructions converting numerical values to Boolean ones. An important class of such instructions evaluate predicates, e.g., *cmp in LLVM. That alone allows us to infer the input dependency (c.f. the taint analysis) during the fuzzing on-the-fly with reasonable accuracy, which in turn enables an effective use of the gradient descent on these instructions (to invert the result of their evaluation). Although the fuzzing attempts to maximize the coverage of the instructions, there is an interesting correlation with the standard branch coverage, which we are able to achieve indirectly. The evaluation on Test-Comp 2023 benchmarks shows that our approach, despite being a pure gray-box fuzzing, is able to compete with the leading tools in the competition, which combine fuzzing with other powerful techniques like model checking, symbolic execution, or abstract interpretation.
Programming Languages
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the effectiveness of gray - box fuzzing, especially in maximizing branch coverage. Specifically, the author proposes a new gray - box fuzzing algorithm. By monitoring instructions that convert numerical values to Boolean values (such as the `cmp` instruction in LLVM) and using the gradient - descent method to invert the results of these instructions. This method makes input generation more effective and can compete with current leading fuzzing tools without relying on other complex techniques (such as model checking, symbolic execution or abstract interpretation). ### Key Problems and Solutions 1. **Problem Description**: - Traditional gray - box fuzzing is less efficient in generating inputs that can trigger new paths. - It is difficult to effectively identify and utilize instructions in the program that convert numerical values to Boolean values (such as comparison instructions) for more precise input generation. 2. **Solution**: - **Monitor Numerical - to - Boolean Conversion**: By inserting monitoring code, collect relevant information (such as comparison results, double - precision floating - point values, etc.) each time a Boolean instruction is executed. This allows input dependencies to be inferred at runtime and the gradient - descent method to be used to adjust the input in an attempt to invert the results of Boolean expressions. - **Gradient - Descent Optimization**: For each Boolean instruction, calculate its corresponding double - precision floating - point value and use the gradient - descent method to gradually adjust the input to find the input that inverts the result of the Boolean expression. - **Indirectly Maximize Branch Coverage**: Although the main goal is to maximize the coverage of Boolean instructions, this method indirectly improves the effectiveness of standard branch coverage. ### Formula Representation - For a comparison instruction \( l \, \triangleright \, r \), where \( l \) and \( r \) are numerical registers and \(\triangleright\) is a comparison operator (such as `=`, `!=`, `<`, `≤`, `>`, `≥`), the monitoring code will calculate: \[ \text{value} = (double) l - (double) r \] This value will be passed to the monitoring function: \[ \text{__sbt_fizzer_process_condition}(id, \text{instr\_result}, \text{value}, \text{xor}) \] - In the gradient - descent process, in order to find the input that inverts the result of the Boolean expression, the input needs to be fine - tuned. Suppose we have a function \( f(x) \) of a Boolean expression, we need to find two inputs \( u \) and \( v \) such that: \[ f(u) \cdot f(v) < 0 \] That is, the signs of \( f(u) \) and \( f(v) \) are opposite. In this way, the method proposed in this paper can generate inputs more effectively in gray - box fuzzing, thereby improving test coverage and the ability to discover vulnerabilities.