Dinkel: Testing Graph Database Engines via State-Aware Query Generation

Dominic Wüst,Zu-Ming Jiang,Zhendong Su
2024-08-14
Abstract:Graph database management systems (GDBMSs) store and manipulate graph data and form a core part of many data-driven applications. To ensure their reliability, several approaches have been proposed to test GDBMSs by generating queries in Cypher, the most popular graph query language. However, Cypher allows queries with complicated state changes and data dependencies, which existing approaches do not support and thus fail to generate valid, complex queries, thereby missing many bugs in GDBMSs. In this paper, we propose a novel state-aware testing approach to generate complex Cypher queries for GDBMSs. Our approach models two kinds of graph state, query context and graph schema. Query context describes the available Cypher variables and their corresponding scopes, whereas graph schema summarizes the manipulated graph labels and properties. While generating Cypher queries, we modify the graph states on the fly to ensure each clause within the query can reference the correct state information. In this way, our approach can generate Cypher queries with multiple state changes and complicated data dependencies while retaining high query validity. We implemented this approach as a fully automatic GDBMS testing framework, Dinkel, and evaluated it on three popular open-source GDBMSs, namely Neo4j, RedisGraph, and Apache AGE. In total, Dinkel found 60 bugs, among which 58 were confirmed and 51 fixed. Our evaluation results show that Dinkel can effectively generate complex queries with high validity (93.43%). Compared to existing approaches, Dinkel can cover over 60% more code and find more bugs within the 48-hour testing campaign. We expect Dinkel's powerful test-case generation to benefit GDBMS testing and help strengthen the reliability of GDBMSs.
Databases,Software Engineering
What problem does this paper attempt to address?
This paper attempts to solve the problems existing in the testing process of graph database management systems (GDBMSs), especially the challenge of generating complex and effective Cypher queries. Specifically: 1. **Limitations of Existing Methods**: - Existing GDBMS testing methods mainly focus on how to generate queries through simple templates and combine test oracles to identify errors triggered by the generated queries. - These methods fail to systematically model state changes in the Cypher query language, resulting in the queries they generate being relatively simple and unable to cover complex multi - clause queries and complex data dependencies. - Therefore, these methods miss many bugs in deep - level logic during testing. 2. **The Solution to the Problem Proposed in the Paper**: - In order to improve the reliability and security of GDBMSs, the paper proposes a brand - new state - aware - based testing method to generate complex and effective Cypher queries. - This method introduces two abstractions: **query context** and **graph schema** for accurately modeling the graph states maintained by GDBMSs when processing queries. - The **query context** describes the temporary variables declared in each clause, as well as their scopes and types. - The **graph schema** stores the graph labels and properties available in each clause. - When generating queries, this method will dynamically update these state information, ensuring that each clause can correctly reference the current state information, thereby generating effective queries involving complex data dependencies and state changes. 3. **Specific Implementation**: - The paper has implemented a fully - automatic GDBMS testing framework named Dinkel, which can effectively generate complex Cypher queries and has been evaluated on three popular open - source GDBMSs (Neo4j, RedisGraph, and Apache AGE). - The evaluation results show that Dinkel can cover more than 60% more code and discover more bugs in a 48 - hour testing activity, demonstrating its advantages in generating complex and effective queries. In summary, this paper aims to solve the deficiencies of existing GDBMS testing methods in generating complex Cypher queries by introducing a state - aware query generation method, thereby improving the reliability and security of GDBMSs.