Abstract:Programs that take highly-structured files as inputs normally process inputs in stages: syntax parsing, semantic checking, and application execution. Deep bugs are often hidden in the application execution stage, and it is non-trivial to automatically generate test inputs to trigger them. Mutation-based fuzzing generates test inputs by modifying well-formed seed inputs randomly or heuristically. Most inputs are rejected at the early syntax parsing stage. Differently, generation-based fuzzing generates inputs from a specification (e.g., grammar). They can quickly carry the fuzzing beyond the syntax parsing stage. However, most inputs fail to pass the semantic checking (e.g., violating semantic rules), which restricts their capability of discovering deep bugs. In this paper, we propose a novel data-driven seed generation approach, named Skyfire, which leverages the knowledge in the vast amount of existing samples to generate well-distributed seed inputs for fuzzing programs that process highly-structured inputs. Skyfire takes as inputs a corpus and a grammar, and consists of two steps. The first step of Skyfire learns a probabilistic context-sensitive grammar (PCSG) to specify both syntax features and semantic rules, and then the second step leverages the learned PCSG to generate seed inputs. We fed the collected samples and the inputs generated by Skyfire as seeds of AFL to fuzz several open-source XSLT and XML engines (i.e., Sablotron, libxslt, and libxml2). The results have demonstrated that Skyfire can generate well-distributed inputs and thus significantly improve the code coverage (i.e., 20% for line coverage and 15% for function coverage on average) and the bug-finding capability of fuzzers. We also used the inputs generated by Skyfire to fuzz the closed-source JavaScript and rendering engine of Internet Explorer 11. Altogether, we discovered 19 new memory corruption bugs (among which there are 16 new vulnerabilities and received 33.5k USD bug bounty rewards) and 32 denial-of-service bugs.

Evaluating seed selection for fuzzing JavaScript engines

Selecting Initial Seeds for Better JVM Fuzzing

FA-Fuzz: A Novel Scheduling Scheme Using Firefly Algorithm for Mutation-Based Fuzzing

SmartSeed: Smart Seed Generation for Efficient Fuzzing

A Lightweight and High-Precision Approach for Bulky JavaScript Engines Fuzzing

SeededFuzz: Selecting and Generating Seeds for Directed Fuzzing

SmartSeed:Smart seed generation strategy for fuzzing testing

AMF: Efficient Browser Interprocess Communication Fuzzing

Skyfire: Data-Driven Seed Generation for Fuzzing

JIT-Picking: Differential Fuzzing of JavaScript Engines

FUZZILLI: Fuzzing for JavaScript JIT Compiler Vulnerabilities

EnFuzz: Ensemble Fuzzing with Seed Synchronization among Diverse Fuzzers

MEUZZ: Smart Seed Scheduling for Hybrid Fuzzing

Industrial Oriented Evaluation of Fuzzing Techniques

SYNTONY: Potential-Aware Fuzzing with Particle Swarm Optimization

The Art, Science, and Engineering of Fuzzing: A Survey

ISC4DGF: Enhancing Directed Grey-box Fuzzing with LLM-Driven Initial Seed Corpus Generation

Evolutionary Mutation-based Fuzzing as Monte Carlo Tree Search

ShapFuzz: Efficient Fuzzing Via Shapley-Guided Byte Selection

Semantic Sensitive Coverage-based Fuzzing

Systematic Assessment of Fuzzers using Mutation Analysis