Abstract:In recent years, the programming capabilities of large language models (LLMs) have garnered significant attention. Fuzz testing, a highly effective technique, plays a key role in enhancing software reliability and detecting vulnerabilities. However, traditional fuzz testing tools rely on manually crafted fuzz drivers, which can limit both testing efficiency and effectiveness. To address this challenge, we propose an automated fuzz testing method driven by a code knowledge graph and powered by an LLM-based intelligent agent system, referred to as CKGFuzzer. We approach fuzz driver creation as a code generation task, leveraging the knowledge graph of the code repository to automate the generation process within the fuzzing loop, while continuously refining both the fuzz driver and input seeds. The code knowledge graph is constructed through interprocedural program analysis, where each node in the graph represents a code entity, such as a function or a file. The knowledge graph-enhanced CKGFuzzer not only effectively resolves compilation errors in fuzz drivers and generates input seeds tailored to specific API usage scenarios, but also analyzes fuzz driver crash reports, assisting developers in improving code quality. By querying the knowledge graph of the code repository and learning from API usage scenarios, we can better identify testing targets and understand the specific purpose of each fuzz driver. We evaluated our approach using eight open-source software projects. The experimental results indicate that CKGFuzzer achieved an average improvement of 8.73% in code coverage compared to state-of-the-art techniques. Additionally, CKGFuzzer reduced the manual review workload in crash case analysis by 84.4% and successfully detected 11 real bugs (including nine previously unreported bugs) across the tested libraries.

Data Augmentation by Fuzzing for Neural Test Generation

LAFuzz: Neural Network for Efficient Fuzzing

CAGFuzz: Coverage-Guided Adversarial Generative Fuzzing Testing of Deep Learning Systems

VecSeeds: Generate Fuzzing Testcases from Latent Vectors Based on VAE-GAN.

Format-aware Learn&Fuzz: Deep Test Data Generation for Efficient Fuzzing

TensorFuzz: Debugging Neural Networks with Coverage-Guided Fuzzing

CoCoFuzzing: Testing Neural Co de Models With Co verage-Guided Fuzzing

Graph-Based Fuzz Testing for Deep Learning Inference Engine

Graph-based Fuzz Testing for Deep Learning Inference Engines

SpeedNeuzz: Speed Up Neural Program Approximation with Neighbor Edge Knowledge

FDFuzz: Applying Feature Detection to Fuzz Deep Learning Systems

Fuzzing JavaScript Engines with a Syntax-Aware Neural Program Model

Aster: Encoding Data Augmentation Relations into Seed Test Suites for Robustness Assessment and Fuzzing of Data-Augmented Deep Learning Models

Not all bytes are equal: Neural byte sieve for fuzzing

An Intelligent Fuzzing Data Generation Method Based on Deep Adversarial Learning.

FairFuzz: a targeted mutation strategy for increasing greybox fuzz testing coverage

GenAug: Data Augmentation for Finetuning Text Generators

CKGFuzzer: LLM-Based Fuzz Driver Generation Enhanced By Code Knowledge Graph

Fuzzing with Optimized Grammar-Aware Mutation Strategies

V-Fuzz: Vulnerability-Oriented Evolutionary Fuzzing

FuzzCoder: Byte-level Fuzzing Test via Large Language Model