Predicting Change Propagation Between Code Clone Instances by Graph-Based Deep Learning

Bin Hu,Yijian Wu,Xin Peng,Chaofeng Sha,Xiaochen Wang,Baiqiang Fu,Wenyun Zhao
DOI: https://doi.org/10.1145/3524610.3527912
IF: 3.762
2024-01-01
Empirical Software Engineering
Abstract:Code clones widely exist in open-source and industrial software projects and are still recognized as a threat to software maintenance due to the additional effort required for the simultaneous maintenance of multiple clone instances and potential defects caused by inconsistent changes in clone instances. To alleviate the threat, it is essential to accurately and efficiently make the decisions of change propagation between clone instances. Based on an exploratory study on clone change propagation with five famous open-source projects, we find that a clone class can have both propagation-required changes and propagation-free changes and thus fine-grained change propagation decision is required. Based on the findings, we propose a graph-based deep learning approach to predict the change propagation requirements of clone instances. We develop a graph representation, named Fused Clone Program Dependency Graph (FC-PDG), to capture the textual and structural code contexts of a pair of clone instances along with the changes on one of them. Based on the representation, we design a deep learning model that uses a Relational Graph Convolutional Network (R-GCN) to predict the change propagation requirement. We evaluate the approach with a dataset constructed based on 51 open-source Java projects, which includes 24,672 pairs of matched changes and 38,041 non-matched changes. The results show that the approach achieves high precision (83.1%), recall (81.2%), and F1-score (82.1%). Our further evaluation with three other open-source projects confirms the generality of the trained clone change propagation prediction model.
What problem does this paper attempt to address?