AST-path Based Compare-Aggregate Network for Code Clone Detection

Hongliang Liang,Lu Ai
DOI: https://doi.org/10.1109/ijcnn52387.2021.9534099
2021-01-01
Abstract:Code clone detection remains one of the main challenges in maintaining software projects. Recently, state-of-the-art researches have shown that neural models based on abstract syntax trees (ASTs) can better represent code fragment. However, existing tree-based models are prone to gradient vanishing problems due to the large size of ASTs. In this paper, we represent a code fragment as the set of compositional paths in its abstract syntax tree (AST) and use this code representation to train a classifier to detect clone pairs. Unlike the siamese based model that obtains the embeddings of code fragments separately and then computes the similarity in vector space, our compare-aggregate based network takes two code fragments as a whole to obtain the vectors for classification. To validate our model's ability to detect code clones, we evaluated it on the publicly available dataset BigCloneBench, and the experimental results show our model outperforms the state-of-the-art model ASTNN.
What problem does this paper attempt to address?