Two-level clustering of UML class diagrams based on semantics and structure
Zongmin Ma,Zhongchen Yuan,Li Yan
DOI: https://doi.org/10.1016/j.infsof.2020.106456
IF: 3.9
2021-02-01
Information and Software Technology
Abstract:<h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Context</h3><p>The reuse of software design has been an important issue of software reuse. UML class diagrams are widely applied in software design and has become DE factor standard. As a result, the reuse of UML class diagrams has received more attention. With the increasing number of class diagrams stored in reuse repository, their retrieval becomes a time-consuming job. The clustering can narrow down retrieval range and improve the retrieval efficiency. But few efforts have been done in clustering UML class diagrams. This paper tries to propose a clustering approach for UML class diagrams.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Objective</h3><p>This paper proposes a two-level clustering of UML class diagrams, namely, semantic clustering and structural clustering. The UML class diagrams stored in reuse repository are clustered into a few domains based on semantics in the first level and a few categories based on structure in the second level.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Method</h3><p>We propose a clustering algorithm named <em>CUFS</em>, in which the idea of partitioning and hierarchical clustering is combined and feature similarity is proposed for the similarity measure between two clusters in order to merge clusters. A better feature representation of a cluster, namely, feature class diagram, is proposed in this paper. In order to form each sub-cluster, the semantic and structural similarities between UML class diagrams are defined, respectively.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Results</h3><p>A series of experimental results show that, the proposed feature similarity measure not only speeds up the clustering process, but also expresses the closeness degree between clusters for merging clusters. The proposed algorithm shows a good clustering quality and efficiency under the condition of different size and distribution of UML class diagrams.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Conclusion</h3><p>It is concluded that the proposed two-level clustering method considers both semantics and structure contained in a class diagram, which can flexibly adapt to different clustering requirements. Also, the proposed clustering algorithm performs better than other related algorithms, regardless of in semantic, structural and hybrid clustering.</p>
computer science, information systems, software engineering