A Multi-Agent Approach for REST API Testing with Semantic Graphs and LLM-Driven Inputs

Myeongsoo Kim,Tyler Stennett,Saurabh Sinha,Alessandro Orso
2024-11-12
Abstract:As modern web services increasingly rely on REST APIs, their thorough testing has become crucial. Furthermore, the advent of REST API specifications such as the OpenAPI Specification has led to the emergence of many black-box REST API testing tools. However, these tools often focus on individual test elements in isolation (e.g., APIs, parameters, values), resulting in lower coverage and less effectiveness in detecting faults (i.e., 500 response codes). To address these limitations, we present AutoRestTest, the first black-box framework to adopt a dependency-embedded multi-agent approach for REST API testing, integrating Multi-Agent Reinforcement Learning (MARL) with a Semantic Property Dependency Graph (SPDG) and Large Language Models (LLMs). Our approach treats REST API testing as a separable problem, where four agents -- API, dependency, parameter, and value -- collaborate to optimize API exploration. LLMs handle domain-specific value restrictions, the SPDG model simplifies the search space for dependencies using a similarity score between API operations, and MARL dynamically optimizes the agents' behavior. Evaluated on 12 real-world REST services, AutoRestTest outperforms the four leading black-box REST API testing tools, including those assisted by RESTGPT (which augments realistic test inputs using LLMs), in terms of code coverage, operation coverage, and fault detection. Notably, AutoRestTest is the only tool able to identify an internal server error in Spotify. Our ablation study underscores the significant contributions of the agent learning, SPDG, and LLM components.
Software Engineering,Artificial Intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve several key problems in REST API testing in modern Web services: 1. **Low coverage**: Existing REST API testing tools usually focus on a single test element (such as API, parameter, value), resulting in low code coverage and operation coverage, especially in large - scale REST services (e.g., Language Tool, Genome Nexus, Market, Spotify, OhSome, etc.). 2. **Insufficient fault - detection ability**: Existing tools perform poorly in detecting internal server errors (i.e., 500 response codes), which may lead to potential security vulnerabilities and functional defects remaining undetected. 3. **Isolated testing strategies**: Existing testing tools adopt an isolated approach when dealing with each test step (such as operation selection, dependency identification, parameter selection, value generation), rather than a coordinated testing strategy, which may lead to sub - optimal testing strategies and a large number of invalid requests. ### Solutions To solve the above problems, the paper proposes AutoRestTest, a new black - box testing framework that integrates the following key technologies: 1. **Semantic Property Dependency Graph (SPDG)**: By calculating the cosine similarity between input / output names, give priority to properties, thereby reducing the search space for operation dependencies. 2. **Multi - Agent Reinforcement Learning (MARL)**: Adopt the multi - agent value - decomposition Q - learning method to make four specialized agents (API agent, dependency agent, parameter agent, value agent) cooperate to optimize the testing process. 3. **Large - Scale Language Models (LLMs)**: Utilize large - scale language models to generate realistic parameter values that meet the specifications and handle domain - specific value limitations. ### Specific methods 1. **Initialization phase**: - **Parse OpenAPI specifications**: Extract endpoint information, parameters, and request / response patterns. - **Construct SPDG**: Identify potential dependencies between operations based on semantic similarity, and verify and refine these dependencies through actual server responses. - **Initialize REST agents**: Create Q - tables for each agent (operation agent, parameter agent, value agent, dependency agent) to define their specific responsibilities in the testing process. 2. **Test execution phase**: - **Operation selection**: The operation agent selects the next API operation according to its learned Q - values and exploration strategies. - **Parameter selection**: The parameter agent determines the parameters to be included, considering the required and optional parameters in the specification. - **Value generation**: The value agent generates parameter values, using the dependencies identified by the dependency agent, values generated by the LLM, or the default assignment of basic parameter types. - **Request generation**: The request generator constructs API requests, and modifies 20% of the requests through the mutation component to test error handling and trigger potential 500 response codes. - **Update Q - tables**: Update the Q - tables of all agents according to the server response, verify and refine SPDG dependencies, and store 500 responses for the final test report. ### Experimental results The paper evaluated AutoRestTest on 12 real - world RESTful services and compared it with four state - of - the - art REST testing tools. The experimental results show that AutoRestTest is significantly superior to other tools in terms of code coverage, operation coverage, and fault detection, especially in detecting internal server errors uniquely in the Spotify service. ### Main contributions 1. **Propose a new REST API testing technique**: Reduce the search space for operation dependencies through the semantic similarity graph model, optimize test steps using multi - agent reinforcement learning, and generate realistic test inputs with the help of large - scale language models. 2. **Experimentally prove the effectiveness of AutoRestTest**: Outperform existing tools in multiple indicators, cover more operations, achieve higher code coverage, and trigger more service failures. 3. **Provide comprehensive resources**: Including the AutoRestTest tool, benchmark services, and detailed experimental results, providing valuable resources for subsequent research and applications.