Semantic Code Graph—An Information Model to Facilitate Software Comprehension

Krzysztof Borowski,Bartosz Balis,Tomasz Orzechowski
DOI: https://doi.org/10.1109/access.2024.3351845
IF: 3.9
2024-03-02
IEEE Access
Abstract:Software comprehension is becoming increasingly time-consuming due to the continual growth in the size of codebases. Consequently, it is becoming more critical to speed up the code comprehension process to aid in software maintenance and lower the associated costs. A crucial aspect of this process is understanding and preserving the high quality of the code dependency structure. While a variety of code structure models already exist, there is a surprising scarcity of models that closely represent the source code and focus on software comprehension. As a result, there are no readily available and easy-to-use tools to assist with dependency comprehension, refactoring, and quality monitoring of code. To address this gap, we introduce the Semantic Code Graph (SCG), an information model that offers a detailed abstract representation of code dependencies with a close link to the source code. We establish the critical properties of the SCG model and demonstrate its implementation for Java and Scala languages. To validate the SCG model's usefulness in software comprehension, we compare it to nine other source code representation models. Additionally, we select 11 well-known and widely-used open-source projects developed in Java and Scala and perform a range of software comprehension activities on them using three different code representation models: the proposed SCG, the Call Graph, and the Class Collaboration Network. We then qualitatively analyze the results to compare the performance of these models in terms of software comprehension capabilities. These activities encompass project structure comprehension, identifying critical project entities, interactive visualization of code dependencies, and uncovering code similarities through software mining. Our findings demonstrate that the SCG enhances software comprehension capabilities compared to the prevailing Class Collaboration Network and Call Graph models. Moreover, the SCG-based data analysis yields actionable software comprehension insights. We also release an open-source tool, scg-cli, to assist with result reproduction and further research. We believe that the work described is a step towards the next generation of tools that streamline code dependency comprehension and management.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?