Data-access performance anti-patterns in data-intensive systems

Muse, Biruk Asmare,Nafi, Kawser Wazed,Khomh, Foutse,Antoniol, Giuliano
DOI: https://doi.org/10.1007/s10664-024-10535-8
IF: 3.762
2024-08-31
Empirical Software Engineering
Abstract:Data-intensive systems handle variable, high-volume, and high-velocity data generated by human and digital devices. Like traditional software, data-intensive systems are prone to technical debts introduced to cope-up with the pressure of time and resource constraints on developers. Data-access is a critical component of data-intensive systems, as it determines their overall performance and functionality. While data access technical debts are getting attention from the research community, technical debts that affect performance are not well investigated. This study aims to identify, categorize, and validate data-access performance anti-patterns. We collected issues from NoSQL-based and polyglot persistence open-source data-intensive systems, implemented in Java programing language, and identified 14 new data access-performance anti-patterns categorized under seven high-level categories. We conducted a developer survey to evaluate the perceived relevance and criticality of the newly identified anti-patterns and found that Improper Handling of Node Failures , Using Synchronous Connection , and Inefficient Driver API performance anti-patterns are the most critical data-access performance anti-patterns. The study findings can help improve the quality of data-intensive software systems by raising awareness of practitioners about the impact of the data-access performance anti-patterns. At the same time, the findings will help quality assurance teams to prioritize the correction of performance anti-patterns based on their criticality.
computer science, software engineering
What problem does this paper attempt to address?