Boreas: an Accurate and Scalable Token-Based Approach to Code Clone Detection

Yang Yuan,Yao Guo
DOI: https://doi.org/10.1145/2351676.2351725
2012-01-01
Abstract:Detecting code clones in a program has many applications in software engineering and other related fields. In this paper, we present Boreas, an accurate and scalable token-based approach for code clone detection. Boreas introduces a novel counting-based method to define the characteristic matrices, which are able to describe the program segments distinctly and effectively for the purpose of clone detection. We conducted experiments on JDK 7 and Linux kernel 2.6.38.6 source code. Experimental results show that Boreas is able to match the detecting accuracy of a recently proposed syntactic-based tool Deckard, with the execution time reduced by more than an order of magnitude.
What problem does this paper attempt to address?