Sedar: Obtaining High-Quality Seeds for DBMS Fuzzing Via Cross-DBMS SQL Transfer

Jingzhou Fu,Jie Liang,Zhiyong Wu,Yu Jiang
DOI: https://doi.org/10.1145/3597503.3639210
2024-01-01
Abstract:Effective DBMS fuzzing relies on high-quality initial seeds, which serve as the starting point for mutation. These initial seeds should incorporate various DBMS features to explore the state space thoroughly. While built-in test cases are typically used as initial seeds, many DBMSs lack comprehensive test cases, making it difficult to apply state-of-the-art fuzzing techniques directly. To address this, we propose Sedar which produces initial seeds for a target DBMS by transferring test cases from other DBMSs. The underlying insight is that many DBMSs share similar functionalities, allowing seeds that cover deep execution paths in one DBMS to be adapted for other DBMSs. The challenge lies in converting these seeds to a format supported by the grammar of the target database. Sedar follows a three-step process to generate seeds. First, it executes existing SQL test cases within the DBMS they were designed for and captures the schema information during execution. Second, it utilizes large language models (LLMs) along with the captured schema information to guide the generation of new test cases based on the responses from the LLM. Lastly, to ensure that the test cases can be properly parsed and mutated by fuzzers, Sedar temporarily comments out unparsable sections for the fuzzers and uncomments them after mutation. We integrate Sedar into the DBMS fuzzers Sqirrel and Griffin, targeting DBMSs such as Virtuoso, MonetDB, DuckDB, and ClickHouse. Evaluation results demonstrate significant improvements in both fuzzers. Specifically, compared to Sqirrel and Griffin with non-transferred seeds, Sedar enhances code coverage by 72.46%-214.84% and 21.40%-194.46%; compared to Sqirrel and Griffin with native test cases of these DBMSs as initial seeds, incorporating the transferred seeds of Sedar results in an improvement in code coverage by 4.90%-16.20% and 9.73%-28.41%. Moreover, Sedar discovered 70 new vulnerabilities, with 60 out of them being uniquely found by Sedar with transferred seeds, and 19 of them have been assigned with CVEs.
What problem does this paper attempt to address?