Automatically recognizing the semantic elements from UML class diagram images

Fangwei Chen,Li Zhang,Xiaoli Lian,Nan Niu
DOI: https://doi.org/10.1016/j.jss.2022.111431
IF: 3.5
2022-11-01
Journal of Systems and Software
Abstract:Context:Design models are essential for multiple tasks in software engineering, such as consistency checking, code generation, and design-to-code tracing. Almost all of these works need a semantically analyzable model to represent the software architecture design, e.g., a UML class diagram. Unfortunately, many design models are stored as images and embedded in text-based documentations, impeding the usage and evolution of these models. Thus, identifying the semantic elements of design models from images is important. However, there are lots of design models with different elements in diverse representations, which ask for different approaches for semantic elements extraction.Objective:In order to grasp an overview of the commonly used design model types, we conduct a survey on both open-source communities and industry. We find that design model diagrams are usually embedded in documents as pictures (73.72%), and UML class diagrams are the most used type (55.43%). Considering that there are limited studies on automatically recognizing the semantic elements from class diagram images, we propose an approach, which we call ReSECDI.Method:ReSECDI includes our customized design for extracting UML class diagram elements based on image processing technologies. We design a rectangle clustering method for class recognition, to address the challenge that the presentation of classes may vary due to the UML constraints and tools’ styles. We design a polygonal line merging method and double-recognition-approximation method for relationship recognition to deal with the impact of low resolution on the detection.Results:We evaluate the applicability of ReSECDI on 30 images drawn by three popular UML tools and 50 diagrams collected from the open-source communities, and get promising performances.Conclusion:ReSECDI can recognize all types of semantic elements commonly used. It has well applicability and can be used to process the images drawn by the mainstream tools and stored in different resolutions.Editor’s note: Open Science material was validated by the Journal of Systems and Software Open Science Board.
computer science, theory & methods, software engineering
What problem does this paper attempt to address?