ASPDup: AST-Sequence-based Progressive Duplicate Code Detection Tool for Onsite Programming Code

Yaoshen Yu,Zhiqiu Huang,Yu Zhou,Weiwei Li,Yichao Shao
DOI: https://doi.org/10.1145/3457913.3457938
2020-11-01
Abstract:Duplicate code is an example of bad smells, which are usually been refactored after the detection to improve the quality of programs. Locate the duplicate code at the programming phase may reduce the cost of maintenance, but the challenge is it need to detect duplicate code between an incomplete code fragment with complete files, which the existing tools are hard to be applied to this scenario. In this paper, we propose an AST-sequence-based duplicate code detection approach for onsite programming code. The abstract syntax tree (AST) is extracted from source code and then is transformed into an encoded sequence. A local sequence alignment algorithm is used to find highly similar subsequences. After the post-processing, similar regions will be found between two code fragments according to the subsequences. We have developed a prototype tool as a plugin for Visual Studio Code. Experimental results indicate that our approach is effective in finding highly similar regions between cross-granularity code fragments, which can facilitate duplicate code detection for incomplete onsite programming code.
Computer Science
What problem does this paper attempt to address?