A Large-Gap Clone Detection Approach Using Sequence Alignment Via Dynamic Parameter Optimization.

Jinze Liu,Tao Wang,Chenhui Feng,Huaimin Wang,Dongsheng Li
DOI: https://doi.org/10.1109/access.2019.2940710
IF: 3.9
2019-01-01
IEEE Access
Abstract:Large-gap clones, a kind of clones that reuses code with many edits, are very common in software development practice and widespread in software systems. The detection of such clones is very crucial. However, due to a large number of edits, most of the current work fails to detect such clones effectively. This paper aims to find an effective approach for accurate detection of large-gap clones. We transform the code clone detection problem into a biological sequence alignment question and propose a novel approach that combines code fingerprint with sequence alignment. The sequence alignment is Smith-Waterman algorithm based, but shows significant improvements using dynamic parameter acquisition strategy. Furthermore, we design new rational criteria for clone identification. The proposed approach is automatically evaluated extensively by more than 10 million lines of code for general clones detection. We further conduct an empirical study on five large-scale Java projects to manually measure the approach for large-gap clones detection. The experimental results show that the proposed approach can effectively detect large-gap clones and exhibit good performance, and at the same time remains the competitiveness with existing advanced detection tools in detecting general clone detection.
What problem does this paper attempt to address?