An Empirical Study on the Characteristics of Database Access Bugs in Java Applications

Wei Liu,Shouvick Mondal,Tse-Hsun Chen
2024-05-24
Abstract:Database-backed applications rely on the database access code to interact with the underlying database management systems (DBMSs). Although many prior studies aim at database access issues like SQL anti-patterns or SQL code smells, there is a lack of study of database access bugs during the maintenance of database-backed applications. In this paper, we empirically investigate 423 database access bugs collected from seven large-scale Java open source applications that use relational database management systems (e.g., MySQL or PostgreSQL). We study the characteristics (e.g., occurrence and root causes) of the bugs by manually examining the bug reports and commit histories. We find that the number of reported database and non-database access bugs share a similar trend but their modified files in bug fixing commits are different. Additionally, we generalize categories of the root causes of database access bugs, containing five main categories (SQL queries, Schema, API, Configuration, SQL query result) and 25 unique root causes. We find that the bugs pertaining to SQL queries, Schema, and API cover 84.2% of database access bugs across all studied applications. In particular, SQL queries bug (54%) and API bug (38.7%) are the most frequent issues when using JDBC and Hibernate, respectively. Finally, we provide a discussion on the implications of our findings for developers and researchers.
Software Engineering,Databases
What problem does this paper attempt to address?
The paper primarily focuses on the characteristics of database access bugs in Java applications. Specifically, the research aims to address the following questions: 1. **Understanding the occurrence trends of database access bugs**: Researchers want to explore whether these bugs are more concentrated at a specific stage of the development cycle or are evenly distributed throughout the lifecycle. This helps developers better understand when to pay special attention to such bugs. 2. **Identifying the root causes of database access bugs**: By analyzing real cases, researchers hope to summarize the main causes of these bugs and categorize them. This helps developers identify and avoid common pitfalls. 3. **Comparing the impact of different database access technologies**: The research also explores the differences in the characteristics of database access bugs when using Java Database Connectivity (JDBC) and Object-Relational Mapping (ORM) frameworks. This can provide guidance for choosing the appropriate access technology. Through an empirical study of 423 database access bugs collected from seven large open-source Java applications, the authors found: - The number trends of database access bugs are similar to non-database access bugs, but the types of files modified during the fixing process are different, indicating that the two types of bugs may have different root causes. - Based on a manual analysis of the root causes of the bugs, researchers summarized five main categories (SQL queries, schema, API, configuration, SQL query results) and 25 unique root causes. Among them, bugs related to SQL queries, schema, and API account for 84.2% of the database access bugs in all studied applications. - In applications using JDBC, SQL query-related bugs are the most common (accounting for 54%), while in applications using Hibernate, API-related bugs are the most common (accounting for 38.7%). In conclusion, this study provides an in-depth understanding of the characteristics of database access bugs, helping developers and researchers improve code quality and reduce the occurrence of such bugs.