Abstract:Data-serialization libraries are essential tools in software development, responsible for converting between programmable data structures and data persistence formats. Among them, JSON is the most popular choice for exchanging data between different systems and programming languages, while JSON libraries serve as the programming toolkit for this task. Despite their widespread use, bugs in JSON libraries can cause severe issues such as data inconsistencies and security vulnerabilities. Unit test generation techniques are widely adopted to identify bugs in various libraries. However, there is limited systematic testing effort specifically for exposing bugs within JSON libraries in industrial practice. In this paper, we propose JSONTestGen, an approach leveraging large language models (LLMs) to generate unit tests for fastjson2, a popular open source JSON library from Alibaba. Pre-trained on billions of open-source text and code corpora, LLMs have demonstrated remarkable abilities in programming tasks. Based on historical bug-triggering unit tests, we utilize LLMs to generate more diverse test cases by incorporating JSON domain-specific mutation rules. To systematically and efficiently identify potential bugs, we adopt differential testing on the results of the generated unit tests. Our evaluation shows that JSONTestGen outperforms existing test generation tools in unknown defect detection. With JSONTestGen, we found 34 real bugs in fastjson2, 30 of which have already been fixed, including 12 non-crashing bugs. While manual inspection reveals that LLM-generated tests can be erroneous, particularly with self-contradictory assertions, we demonstrate that LLMs have the potential for classifying false-positive test failures. This suggests a promising direction for improved test oracle automation in the future.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve the potential defects and vulnerability problems in JSON libraries (especially Alibaba's open - source fastjson2 library). Specifically, the authors propose a method based on large - language models (LLMs) - **JSONTESTGEN**, which is used to generate unit tests in order to discover unknown errors in fastjson2 more systematically and effectively, including non - crashing functional errors. #### Background and Motivation 1. **Importance of JSON Libraries** - JSON is a widely used data - exchange format in modern software development. - JSON libraries are responsible for converting programmable data structures into persistent data formats and for data exchange between different systems and programming languages. 2. **Existing Problems** - Although JSON libraries are widely used, they may contain errors that lead to serious consequences, such as data inconsistency and security vulnerabilities. - Although existing unit - test - generation techniques can identify certain types of errors, they are still insufficient in tests specifically for JSON libraries, especially in detecting non - crashing logic errors. 3. **Research Motivation** - Propose a method of using large - language models to automatically generate more diverse unit tests in order to cover various APIs of JSON libraries more comprehensively. - Through differential testing, compare the results of different versions or implementations to identify potential unknown errors. #### Solution The authors propose a method named **JSONTESTGEN**, and the main steps are as follows: 1. **Collect Historical Unit Tests** - Collect unit tests related to historical issues from the GitHub repository of fastjson2 as the original data set. 2. **Understanding Stage** - Use large - language models to summarize the original unit tests and extract key information such as target APIs and core operations. 3. **Generation Stage** - Combine the summary information and specific JSON - domain mutation rules, and use large - language models to generate new unit tests. 4. **Differential Testing** - Execute the newly generated unit tests and identify potential errors by comparing the results of different JSON - library implementations. #### Main Contributions - **First Application of Large - Language Models in JSON - Library Error Detection**: By learning existing unit tests, automatically generate diverse test cases for bug detection. - **Design of Effective Prompting Strategies**: Combine JSON - specific mutation rules to guide large - language models to generate high - quality unit tests. - **Successful Discovery of Unknown Errors**: Discover 34 unknown errors in fastjson2, 12 of which are non - crashing errors, which are difficult to detect for existing tools. - **Explore the Direction of Improving Test Automation**: Analyze failure cases and explore the potential of large - language models in identifying false - positive test failures caused by incorrect test logic. Through this method, the authors demonstrate the great potential of large - language models in the field of software testing, especially in terms of improving test coverage and discovering complex errors.

Advancing Bug Detection in Fastjson2 with Large Language Models Driven Unit Test Generation

Large Language Models Based JSON Parser Fuzzing for Bug Discovery and Behavioral Analysis

Large Language Models are Few-shot Testers: Exploring LLM-based General Bug Reproduction

Exploring Automated Assertion Generation Via Large Language Models

Evaluating Diverse Large Language Models for Automatic and General Bug Reproduction

An Exploratory Study on Using Large Language Models for Mutation Testing

Generation-based Differential Fuzzing for Deep Learning Libraries

Automatically Inspecting Thousands of Static Bug Warnings with Large Language Model: How Far Are We?

What's Wrong with Your Code Generated by Large Language Models? An Extensive Study

TESTEVAL: Benchmarking Large Language Models for Test Case Generation

LLM-Powered Test Case Generation for Detecting Tricky Bugs

On the Evaluation of Large Language Models in Unit Test Generation

Test smells in LLM-Generated Unit Tests

NNSmith: Generating Diverse and Valid Test Cases for Deep Learning Compilers

A System for Automated Unit Test Generation Using Large Language Models and Assessment of Generated Test Suites

FlexFL: Flexible and Effective Fault Localization with Open-Source Large Language Models

Automated Unit Test Improvement using Large Language Models at Meta

Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing

Large-scale, Independent and Comprehensive study of the power of LLMs for test case generation

Large Language Models are Zero-Shot Fuzzers: Fuzzing Deep-Learning Libraries via Large Language Models