Understanding Code Understandability Improvements in Code Reviews

Delano Oliveira,Reydne Santos,Benedito de Oliveira,Martin Monperrus,Fernando Castor,Fernanda Madeiral
2024-10-29
Abstract:Motivation: Code understandability is crucial in software development, as developers spend 58% to 70% of their time reading source code. Improving it can improve productivity and reduce maintenance costs. Problem: Experimental studies often identify factors influencing code understandability in controlled settings but overlook real-world influences like project culture, guidelines, and developers' backgrounds. Ignoring these factors may yield results with limited external validity. Objective: This study investigates how developers enhance code understandability through code review comments, assuming that code reviewers are specialists in code quality. Method and Results: We analyzed 2,401 code review comments from Java open-source projects on GitHub, finding that over 42% focus on improving code understandability. We further examined 385 comments specifically related to this aspect and identified eight categories of concerns, such as inadequate documentation and poor identifiers. Notably, 83.9% of suggestions for improvement were accepted and integrated, with fewer than 1% later reverted. We identified various types of patches that enhance understandability, from simple changes like removing unused code to context-dependent improvements such as optimizing method calls. Additionally, we evaluated four well-known linters for their ability to flag these issues, finding they cover less than 30%, although many could be easily added as new rules. Implications: Our findings encourage the development of tools to enhance code understandability, as accepted changes can serve as reliable training data for specialized machine-learning models. Our dataset supports this training and can inform the development of evidence-based code style guides. Data Availability: Our data is publicly available at <a class="link-external link-https" href="https://codeupcrc.github.io" rel="external noopener nofollow">this https URL</a>.
Software Engineering
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to explore how developers can improve code understandability during the code review process and reveal the code understandability issues that developers care about in real - world environments. Specifically, the paper focuses on the following aspects: 1. **Differences between experimental research and reality**: - Although laboratory research can control external factors, it ignores the influence of factors such as project culture and developers' backgrounds on code understandability. These factors cannot be ignored in the real world, so laboratory results may lack external validity. 2. **Improvement of code understandability in code review**: - By analyzing code review comments of open - source Java projects on GitHub, the paper explores the code understandability improvement suggestions put forward by developers in code review. The author assumes that code reviewers, as code quality experts in the project, their suggestions reflect important issues in actual development. 3. **Classification and solutions of code understandability problems**: - Researchers identified and classified code understandability problems mentioned in code review (such as incomplete or insufficient code documentation, poor identifiers, unnecessary code, etc.), and analyzed the specific measures taken by developers to solve these problems. 4. **Tool support and automation potential**: - The paper evaluates the performance of existing code inspection tools (such as linters) in detecting code understandability problems and explores how to use code review data to train machine - learning models to automatically identify and improve code understandability. ### Research methods To answer the above questions, the paper adopts the following research methods: - **Data collection**: 363 active Java open - source projects undergoing code review were selected from GitHub, and 2,401 comments were extracted for analysis. - **Manual classification**: These comments were manually classified to identify comments related to code understandability, and the specific contents of 385 comments were further analyzed. - **Statistical analysis**: The proportions of different types of code understandability problems were calculated, as well as the proportions of these suggestions being accepted and integrated into the code base. - **Tool evaluation**: The performance of four popular code inspection tools in detecting code understandability problems was evaluated. ### Main findings - **Importance of code understandability**: More than 42% of code review comments involve the improvement of code understandability, indicating that this aspect plays an important role in code review. - **Common code understandability problems**: Researchers identified eight types of code understandability problems, such as incomplete code documentation, poor identifiers, unnecessary code, etc. - **Acceptance rate of improvement suggestions**: 83.9% of code understandability improvement suggestions were accepted and integrated into the code base, and were rarely revoked (less than 1%). - **Limitations of tools**: Existing code inspection tools can only detect less than 30% of code understandability problems, indicating that there is room for improvement. ### Significance This research provides practical insights into code understandability improvement, which is helpful for developing more effective tools and practices to help developers improve code quality and productivity. In addition, the research results can also provide a basis for formulating evidence - based code style guidelines.