Abstract:Effective DBMS fuzzing relies on high-quality initial seeds, which serve as the starting point for mutation. These initial seeds should incorporate various DBMS features to explore the state space thoroughly. While built-in test cases are typically used as initial seeds, many DBMSs lack comprehensive test cases, making it difficult to apply state-of-the-art fuzzing techniques directly. To address this, we propose Sedar which produces initial seeds for a target DBMS by transferring test cases from other DBMSs. The underlying insight is that many DBMSs share similar functionalities, allowing seeds that cover deep execution paths in one DBMS to be adapted for other DBMSs. The challenge lies in converting these seeds to a format supported by the grammar of the target database. Sedar follows a three-step process to generate seeds. First, it executes existing SQL test cases within the DBMS they were designed for and captures the schema information during execution. Second, it utilizes large language models (LLMs) along with the captured schema information to guide the generation of new test cases based on the responses from the LLM. Lastly, to ensure that the test cases can be properly parsed and mutated by fuzzers, Sedar temporarily comments out unparsable sections for the fuzzers and uncomments them after mutation. We integrate Sedar into the DBMS fuzzers Sqirrel and Griffin, targeting DBMSs such as Virtuoso, MonetDB, DuckDB, and ClickHouse. Evaluation results demonstrate significant improvements in both fuzzers. Specifically, compared to Sqirrel and Griffin with non-transferred seeds, Sedar enhances code coverage by 72.46%-214.84% and 21.40%-194.46%; compared to Sqirrel and Griffin with native test cases of these DBMSs as initial seeds, incorporating the transferred seeds of Sedar results in an improvement in code coverage by 4.90%-16.20% and 9.73%-28.41%. Moreover, Sedar discovered 70 new vulnerabilities, with 60 out of them being uniquely found by Sedar with transferred seeds, and 19 of them have been assigned with CVEs.

Aster: Encoding Data Augmentation Relations into Seed Test Suites for Robustness Assessment and Fuzzing of Data-Augmented Deep Learning Models

Data Augmentation by Fuzzing for Neural Test Generation

Context-Aware Fuzzing for Robustness Enhancement of Deep Learning Models

WGAN-AFL: Seed Generation Augmented Fuzzer with Wasserstein-GAN

Towards Controlled Data Augmentations for Active Learning.

AdaAugment: A Tuning-Free and Adaptive Approach to Enhance Data Augmentation

ISC4DGF: Enhancing Directed Grey-box Fuzzing with LLM-Driven Initial Seed Corpus Generation

SmartSeed: Smart Seed Generation for Efficient Fuzzing

Skyfire: Data-Driven Seed Generation for Fuzzing

A Novel Seed Generation Approach for Vulnerability Mining Based on Generative Adversarial Networks and Attention Mechanisms

CAGFuzz: Coverage-Guided Adversarial Generative Fuzzing Testing of Deep Learning Systems

DIAR: Removing Uninteresting Bytes from Seeds in Software Fuzzing

FDFuzz: Applying Feature Detection to Fuzz Deep Learning Systems

Deep AutoAugment

Better Pay Attention Whilst Fuzzing.

Graphuzz: Data-driven Seed Scheduling for Coverage-guided Greybox Fuzzing

Boosting Model Resilience via Implicit Adversarial Data Augmentation

MetaAugment: Sample-Aware Data Augmentation Policy Learning

Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing Via Cross-DBMS SQL Transfer

SAFL: increasing and accelerating testing coverage with symbolic execution and guided fuzzing.

EnFuzz: from Ensemble Learning to Ensemble Fuzzing.