Toward Interactive Optimization of Source Code Differences: An Empirical Study of Its Performance

Tsukasa Yagi,Shinpei Hayashi
DOI: https://doi.org/10.1109/SCAM63643.2024.00030
2024-09-26
Abstract:A source code difference (diff) indicates changes made by comparing new and old source codes, and it can be utilized in code reviews to help developers understand the changes made to the code. Although many diff generation methods have been proposed, existing automatic methods may generate nonoptimal diffs, hindering reviewers from understanding the changes. In this paper, we propose an interactive approach to optimize diffs. Users can provide feedback for the points of a diff that should not be matched but are or parts that should be matched but are not. The edit graph is updated based on this feedback, enabling users to obtain a more optimal diff. We simulated our proposed method by applying a search algorithm to empirically assess the number of feedback instances required and the amount of diff optimization resulting from the feedback to investigate the potential of this approach. The results of 23 GitHub projects confirm that 92% of nonoptimal diffs can be addressed with less than four feedback actions in the ideal case.
Software Engineering
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem that source code differences (diff) are not optimized enough during the code review process. Specifically: 1. **Limitations of existing methods**: - Although many methods for generating source code differences have been proposed, existing automated methods may generate non - optimal diffs, which can prevent reviewers from understanding code changes. - Non - optimal diffs may lead to illogically grouped changes, affecting the effectiveness of tools that provide additional information based on line differences. 2. **The need for interactive optimization**: - The author proposes an interactive optimization method to improve diffs. Users can provide feedback on parts that should not be matched in the diff or parts that should be matched but are not, thereby updating the edit graph, enabling users to obtain a more optimal diff. - This method allows users to correct wrong mappings in the diff through simple feedback operations without specifying the ideal final result. 3. **Evaluating optimization performance**: - To evaluate the potential of this method, the author conducted an empirical study, simulating the application of a search algorithm to evaluate the number of feedbacks required to reach the optimal diff and the degree of optimization brought by the feedback. - The research results show that, in an ideal situation, 92% of non - optimal diffs can be optimized through less than four feedback operations. ### Main contributions of the paper - Proposed an interactive method for optimizing source code differences. - Evaluated the performance of this interactive optimization method in ideal and average situations through empirical research, revealing its optimization effect. ### Explanation of formulas and symbols - Edit graph \( G=(V, E) \), where \( V \) is the set of nodes and \( E \) is the set of edges. - Node \( v_i^j \) represents the state of reading the \( i \) - th line of the old version and the \( j \) - th line of the new version. - Edges are divided into three types: horizontal edges, vertical edges, and diagonal edges, representing delete, add, and match operations respectively. - Feedback action \( A\subset(N^+\cup\{*\})^2 \), where \( * \) represents a wildcard. Through these methods, the paper shows how to gradually optimize source code differences through user feedback, improving the efficiency and accuracy of code review.