CLUE: Customizing Clustering Techniques Using Machine Learning for Software Modularization

Fanyi Meng,Ying Wang,Chun Yong Chong,Hai Yu,Zhiliang Zhu
DOI: https://doi.org/10.1145/3671016.3674816
2024-01-01
Abstract:Software clustering is often used as a remodularization and architecture recovery technique to help developers simplify software maintenance tasks and ease the burden of software comprehension. While the choice of clustering technique can significantly influence the outcomes of remodularization, it is noteworthy that existing works have yet to conduct an exhaustive exploration of the suitability of various clustering techniques for different software projects. Although many prior works introduce new clustering techniques, their validations often focus on specific domains, which may restrict the generalizability of their findings. In this paper, we conduct an empirical study aimed at understanding the impact of software features and architectural problems on the effectiveness of various software clustering techniques. Leveraging our empirical findings, we propose an approach, CLUE, which leverages Machine Learning (ML) models to customize a suitable software clustering technique for a given software. Our approach focuses on eight types of software clustering techniques and offers insights into their suitability based on features and architectural problems of software. This comprehensive analysis helps developers to select the suitable clustering technique that can achieve the best MoJoFM, c2c(cvg), or TurboMQ value from the chosen pool of software clustering techniques for specific software. We evaluate CLUE by analyzing 100 open-source software projects. The experiment results demonstrate that CLUE achieves highly accurate clustering technique customization, with an accuracy exceeding 90%.
What problem does this paper attempt to address?