DCCD: an Efficient and Scalable Distributed Code Clone Detection Technique for Big Code

Junaid Akram,Zhendong Shi,Majid Mumtaz,Ping Luo
DOI: https://doi.org/10.18293/seke2018-117
2018-01-01
Abstract:Code clone detection is a very hot topic in the field of software maintenance, reuseability and security.There is still a lack of techniques to detect near-miss clones at different level of granularities, especially in big code.This paper presents Distributed Code Clone Detection (DCCD) technique, which detects clones from big code bases based on feature extraction.We performed preprocessing, indexing and clone detection for almost 27 TB of source code (324 billion LOC), DCCD is quite faster and efficient as compared to existing distributed indexing and clone detection techniques, i.e. 36 times faster than Benjamin technique, which is 86 times faster than CCFinder.These two techniques are also distributed and just detect Type-1 and Type-2 clones, but our technique DCCD even detects Type-3 clones, efficiently.Our approach is faster, flexible, scalable and provides 87% accurate results with authenticity, ease of accessibility, upgradeability and maintainability.
What problem does this paper attempt to address?