Tritor: Detecting Semantic Code Clones by Building Social Network-Based Triads Model

Deqing Zou,Siyue Feng,Yueming Wu,Wenqi Suo,Hai Jin
DOI: https://doi.org/10.1145/3611643.3616354
2023-01-01
Abstract:Code clone detection refers to finding the functional similarities between two code fragments, which is becoming increasingly important with the evolution of software engineering. It is reasonable because code cloning can increase maintenance costs and even cause the propagation of vulnerabilities, which can have a negative impact on software security. Numbers of code clone detection methods have been proposed, including tree-based methods that are capable of detecting semantic code clones. However, since tree structure is complex, these methods are difficult to apply to large-scale clone detection. In this paper, we propose a scalable semantic code clone detector based on semantically enhanced abstract syntax tree. Specifically, we add the control flow and data flow details into the original tree and regard the enhanced tree as a social network. Then we build a social network-based triads model to collect the similarity features between the two methods by analyzing different types of triads within the network. After obtaining all features, we use them to train a machine learning-based code clone detector (i.e., Tritor ). Our comparative experimental results show that Tritor is superior to SourcererCC , RtvNN , Deckard , ASTNN , TBCNN , CDLH , and SCDetector , are equally good with DeepSim and FCCA . As for scalability, Tritor is about 39 times faster than another current state-of-the-art tree-based code clone detector ASTNN .
What problem does this paper attempt to address?