Abstract:We present a novel gray-box fuzzing algorithm monitoring executions of instructions converting numerical values to Boolean ones. An important class of such instructions evaluate predicates, e.g., *cmp in LLVM. That alone allows us to infer the input dependency (c.f. the taint analysis) during the fuzzing on-the-fly with reasonable accuracy, which in turn enables an effective use of the gradient descent on these instructions (to invert the result of their evaluation). Although the fuzzing attempts to maximize the coverage of the instructions, there is an interesting correlation with the standard branch coverage, which we are able to achieve indirectly. The evaluation on Test-Comp 2023 benchmarks shows that our approach, despite being a pure gray-box fuzzing, is able to compete with the leading tools in the competition, which combine fuzzing with other powerful techniques like model checking, symbolic execution, or abstract interpretation.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to improve the effectiveness of gray - box fuzzing, especially in maximizing branch coverage. Specifically, the author proposes a new gray - box fuzzing algorithm. By monitoring instructions that convert numerical values to Boolean values (such as the `cmp` instruction in LLVM) and using the gradient - descent method to invert the results of these instructions. This method makes input generation more effective and can compete with current leading fuzzing tools without relying on other complex techniques (such as model checking, symbolic execution or abstract interpretation). ### Key Problems and Solutions 1. **Problem Description**: - Traditional gray - box fuzzing is less efficient in generating inputs that can trigger new paths. - It is difficult to effectively identify and utilize instructions in the program that convert numerical values to Boolean values (such as comparison instructions) for more precise input generation. 2. **Solution**: - **Monitor Numerical - to - Boolean Conversion**: By inserting monitoring code, collect relevant information (such as comparison results, double - precision floating - point values, etc.) each time a Boolean instruction is executed. This allows input dependencies to be inferred at runtime and the gradient - descent method to be used to adjust the input in an attempt to invert the results of Boolean expressions. - **Gradient - Descent Optimization**: For each Boolean instruction, calculate its corresponding double - precision floating - point value and use the gradient - descent method to gradually adjust the input to find the input that inverts the result of the Boolean expression. - **Indirectly Maximize Branch Coverage**: Although the main goal is to maximize the coverage of Boolean instructions, this method indirectly improves the effectiveness of standard branch coverage. ### Formula Representation - For a comparison instruction \( l \, \triangleright \, r \), where \( l \) and \( r \) are numerical registers and \(\triangleright\) is a comparison operator (such as `=`, `!=`, `<`, `≤`, `>`, `≥`), the monitoring code will calculate: \[ \text{value} = (double) l - (double) r \] This value will be passed to the monitoring function: \[ \text{__sbt_fizzer_process_condition}(id, \text{instr\_result}, \text{value}, \text{xor}) \] - In the gradient - descent process, in order to find the input that inverts the result of the Boolean expression, the input needs to be fine - tuned. Suppose we have a function \( f(x) \) of a Boolean expression, we need to find two inputs \( u \) and \( v \) such that: \[ f(u) \cdot f(v) < 0 \] That is, the signs of \( f(u) \) and \( f(v) \) are opposite. In this way, the method proposed in this paper can generate inputs more effectively in gray - box fuzzing, thereby improving test coverage and the ability to discover vulnerabilities.

Gray-Box Fuzzing via Gradient Descent and Boolean Expression Coverage (Technical Report)

FairFuzz: a targeted mutation strategy for increasing greybox fuzz testing coverage

Improving Grey-Box Fuzzing by Modeling Program Behavior

Coverage-based Greybox Fuzzing as Markov Chain

Smart Greybox Fuzzing

Same Coverage, Less Bloat: Accelerating Binary-only Fuzzing with Coverage-preserving Coverage-guided Tracing

PrescientFuzz: A more effective exploration approach for grey-box fuzzing

Target Program Select Test Input Trim Test Input Mutate Test Input Execute Program Update Queue Instrument Program Grammar

Superion: Grammar-Aware Greybox Fuzzing

Generator-Based Fuzzers with Type-Based Targeted Mutation

Evolutionary Mutation-based Fuzzing as Monte Carlo Tree Search

Path Transitions Tell More: Optimizing Fuzzing Schedules Via Runtime Program States

Investigating Coverage Guided Fuzzing with Mutation Testing

GREYONE: Data Flow Sensitive Fuzzing

Fuzzing with Quantitative and Adaptive Hot-Bytes Identification

Fuzzing Based on Function Importance by Interprocedural Control Flow Graph

DeepFuzzer: Accelerated Deep Greybox Fuzzing

VisFuzz: understanding and intervening fuzzing with interactive visualization

$MC^2$: Rigorous and Efficient Directed Greybox Fuzzing

Linear-time Temporal Logic Guided Greybox Fuzzing

Valkyrie: Improving fuzzing performance through deterministic techniques