Leveraging Data Mining Algorithms to Recommend Source Code Changes

AmirHossein Naghshzan,Saeed Khalilazar,Pierre Poilane,Olga Baysal,Latifa Guerrouj,Foutse Khomh
2023-04-30
Abstract:Context: Recent research has used data mining to develop techniques that can guide developers through source code changes. To the best of our knowledge, very few studies have investigated data mining techniques and--or compared their results with other algorithms or a baseline. Objectives: This paper proposes an automatic method for recommending source code changes using four data mining algorithms. We not only use these algorithms to recommend source code changes, but we also conduct an empirical evaluation. Methods: Our investigation includes seven open-source projects from which we extracted source change history at the file level. We used four widely data mining algorithms \ie{} Apriori, FP-Growth, Eclat, and Relim to compare the algorithms in terms of performance (Precision, Recall and F-measure) and execution time. Results: Our findings provide empirical evidence that while some Frequent Pattern Mining algorithms, such as Apriori may outperform other algorithms in some cases, the results are not consistent throughout all the software projects, which is more likely due to the nature and characteristics of the studied projects, in particular their change history. Conclusion: Apriori seems appropriate for large-scale projects, whereas Eclat appears to be suitable for small-scale projects. Moreover, FP-Growth seems an efficient approach in terms of execution time.
Software Engineering,Artificial Intelligence,Machine Learning
What problem does this paper attempt to address?
The main goal of this paper is to recommend source code changes by utilizing four different data mining algorithms (Apriori, FP-Growth, Eclat, and Relim) to assist software developers in being more efficient during software development and maintenance activities. Specifically, the research objectives include: 1. **Recommendation System**: To build a recommendation system that can predict which files are likely to be modified together based on past historical records, thereby providing modification suggestions to developers. 2. **Algorithm Comparison**: To compare the performance differences of four data mining algorithms (Apriori, FP-Growth, Eclat, and Relim) in recommending source code changes. These algorithms were applied to datasets from seven open-source projects, and their performance in terms of recommendation accuracy and execution time was evaluated. 3. **Empirical Study**: To conduct an empirical study to verify the effectiveness of these four algorithms and determine which algorithm is most suitable for projects of different scales. The study found that Apriori is suitable for large projects, while Eclat is more suitable for small projects; FP-Growth showed high efficiency in terms of execution time. 4. **Parameter Configuration**: To explore the impact of different support and confidence configurations on the recommendation results, ultimately selecting 0.1 confidence and 0.2 support as the optimal configuration. In summary, this research aims to provide an effective method for recommending code changes during the software development process by comparing different data mining algorithms and techniques, thereby improving developer productivity and software quality.