Studying the Characteristics of SQL-related Development Tasks: An Empirical Study

Daniel Alencar da Costa,Natalie Grattan,Nigel Stanger,Sherlock A. Licorish
DOI: https://doi.org/10.48550/arXiv.2301.10315
2023-01-25
Abstract:A key function of a software system is its ability to facilitate the manipulation of data, which is often implemented using a flavour of the Structured Query Language (SQL). To develop the data operations of software (i.e, creating, retrieving, updating, and deleting data), developers are required to excel in writing and combining both SQL and application code. The problem is that writing SQL code in itself is already challenging (e.g., SQL anti-patterns are commonplace) and combining SQL with application code (i.e., for SQL development tasks) is even more demanding. Meanwhile, we have little empirical understanding regarding the characteristics of SQL development tasks. Do SQL development tasks typically need more code changes? Do they typically have a longer time-to-completion? Answers to such questions would prepare the community for the potential challenges associated with such tasks. Our results obtained from 20 Apache projects reveal that SQL development tasks have a significantly longer time-to-completion than SQL-unrelated tasks and require significantly more code changes. Through our qualitative analyses, we observe that SQL development tasks require more spread out changes, effort in reviews and documentation. Our results also corroborate previous research highlighting the prevalence of SQL anti-patterns. The software engineering community should make provision for the peculiarities of SQL coding, in the delivery of safe and secure interactive software.
Software Engineering
What problem does this paper attempt to address?
The problem that this paper attempts to solve is about the characteristics of SQL development tasks, especially whether these tasks are special in the software development process. Specifically, the paper focuses on the following aspects: 1. **How many code changes are required for SQL development tasks and how long does it take to solve them?** - By analyzing issue reports, the researchers explored the differences in completion time and code change volume between tasks involving SQL code and those not involving SQL code. The study found that development tasks involving SQL usually take longer to complete and involve more code changes. 2. **How many code changes are required for SQL - related pull requests and how long does it take to merge them?** - The researchers further analyzed pull requests to understand the differences in merge time and code change volume between requests involving SQL code and those not involving SQL code. The results showed that although in most projects, pull requests involving SQL take longer to merge, this trend is not obvious, and in some projects it is even the opposite. 3. **Are SQL development tasks different from non - SQL development tasks in terms of the effort invested?** - In order to deeply understand the reasons for the differences in time consumption and code change volume between SQL development tasks and non - SQL development tasks, the researchers conducted a qualitative analysis. The results indicate that SQL development tasks usually involve a larger scope and require more effort for review and documentation writing. ### Specific research questions and methods - **RQ1: How many code changes are required for issue reports involving SQL code and how long does it take to solve them?** - **Method**: The researchers determined which issue reports involved SQL code by linking GitHub commit records and issue reports in JIRA. They used regular expressions to find SQL queries in the submitted patches. Then, they measured the time difference from the opening to the resolution of each issue report and calculated the number of lines of code added and deleted in all commits related to the issue report. - **Result**: Development tasks involving SQL are slightly longer in completion time than tasks not involving SQL, and involve significantly more code changes. - **RQ2: How many code changes are required for pull requests involving SQL code and how long does it take to merge them?** - **Method**: The researchers collected pull requests that were closed or merged and calculated the time from the creation to the merge of these requests. At the same time, they calculated the code change volume of each pull request. - **Result**: In most projects, pull requests involving SQL take longer to merge, but this trend is not significant, and in some projects it is even the opposite. - **RQ3: Are SQL development tasks different from non - SQL development tasks in terms of the effort invested?** - **Method**: The researchers conducted a qualitative analysis of 687 issue reports to understand the differences in effort investment between SQL development tasks and non - SQL development tasks. - **Result**: SQL development tasks usually involve a larger scope and require more effort for review and documentation writing. ### Conclusion This study reveals the particularities of SQL development tasks in the software development process through empirical analysis, especially in terms of completion time and code change volume. These findings are of great significance for improving software development tools and practices. For example, the quality assurance team can adjust the priority of code review, and software estimators can allocate resources more reasonably. In addition, the study also emphasizes the uniqueness of SQL development tasks in terms of effort investment, providing more in - depth insights for researchers and practitioners.