Similar Code Retrieval Based on the Clustering of Structural Features

Ke-chao WANG,Tian-tian WANG,Zhi-fei WANG,Xiang-min REN,Hai-cheng LIN
DOI: https://doi.org/10.3969/j.issn.1671-1815.2015.11.042
2015-01-01
Abstract:The traditional graph based similar code detection approaches usually have high complexity,and are limited in recognizing code variations.In this paper,we propose a similar code retrieval approach based on the clustering of structural features.Source codes are represented as control dependence trees,and code normalization is performed to eliminate code variations so as to recognize the syntactically different but semantically similar codes.Then,vectors are computed to describe the structural information of source code,and the difficult graph similarity problem is reduced to a simpler vector clustering problem.Candidate similar codes are quickly extracted.Test results show that our method can recognize more code variations than that of the method proposed by Gabel et al.
What problem does this paper attempt to address?