Abstract:Automated construction of reference model (ACRM) automatically constructs reference models by assigning weights to classes for various software projects using the metadata of all software versions and historical maintenance records. ACRM can aid developers in selecting appropriate software clustering techniques in a more structured and rigorous manner by leveraging generated reference models. The undocumented evolution of a software project and its underlying architecture underscores the need to recover the architecture from the software's implementation‐level artifacts. Despite the existence of various software remodularization techniques, they often suffer from inaccuracies, and evaluating their effectiveness is challenging due to the absence of accurate "ground‐truth" architectures or reference models. Prior studies on reference model construction are time‐consuming and labor‐intensive as it heavily relies on manual analysis by domain experts. Besides, other existing approaches that directly utilize the directory or package structure of the latest version can be unreliable, lacking in‐depth analysis of the employed software structure. To address the above limitations, in this paper, we propose Automated Construction of Reference Model (ACRM), an approach for automatically constructing reference models by assigning weights to classes for various software projects using the metadata of all software versions and historical maintenance records. We evaluate ACRM through both quantitative and qualitative analyses. The experiment results provide quantitative validation and show that the generated reference models are reasonable, as confirmed by the relationship between proposed reference models and architectural smells or bugs. Furthermore, we conduct a survey among the practitioners from industry, to gain insights from practitioners' practices and further validate the generated reference models. The survey shows that, on average, 87% of the participants agree with the reference models generated by ACRM. Moreover, we propose an improved metric, wc2c, which analyzes the strengths and weaknesses of different types of software clustering techniques using the proposed reference models of the given software. Finally, we discuss the potential benefits of using ACRM in analyzed projects, particularly in terms of improving software quality, reducing maintenance costs, and enhancing developer productivity.

CLUE: Customizing Clustering Techniques Using Machine Learning for Software Modularization

Mobile User Interface Pattern Clustering Using Improved Semi-Supervised Kernel Fuzzy Clustering Method.

Improving Software Modularization Quality Through the Use of Multi-Pattern Modularity Clustering Algorithm.

Using Multi‐pattern Clustering Methods to Improve Software Maintenance Quality

Program Source-Code Re-Modularization Using a Discretized and Modified Sand Cat Swarm Optimization Algorithm

Evolution-aware Constraint Derivation Approach for Software Remodularization

A Fast Clustering Algorithm for Modularization of Large-Scale Software Systems

Measuring the Refactoring Risk of Modules Using Software Clustering

Granularity Decision Method of Product Based on Intelligent Clustering Algorithm

Scheme Solving Technology for Clustering Optimization of Manufacturing Resources with Hybrid Granularities

Clustering Combination Method

Leveraging Design Rules to Improve Software Architecture Recovery

An Integrated Method for Flexible Platform Modular Architecture Design

Construction of Product Module Based on Similarity and Its Applications

Automated construction of reference model for software remodularization through software evolution

Some Issues on Object-oriented Program Clustering

Customized Multiple Clustering via Multi-Modal Subspace Proxy Learning

A Study of Applying Unsupervised Learning Methods for Document Clustering and Automatic Categorization of Software.

Design pattern directed software clustering approach

Feature-Gathering Dependency-Based Software Clustering Using Dedication and Modularity

User Story Clustering in Agile Development: a Framework and an Empirical Study