CMCD: Count Matrix Based Code Clone Detection

Yang Yuan,Yao Guo
DOI: https://doi.org/10.1109/apsec.2011.13
2011-01-01
Abstract:This paper introduces CMCD, a Count Matrix based technique to detect clones in program code. The key concept behind CMCD is Count Matrix, which is created while counting the occurrence frequencies of every variable in situations specified by pre-determined counting conditions. Because the characteristics of the count matrix do not change due to variable name replacements or even switching of statements, CMCD works well on many hard-to-detect code clones, such as swapping statements or deleting a few lines, which are difficult for other state-of-the-art detection techniques. We have obtained the following interesting results using CMCD: (1) we successfully detected all 16 clone scenarios proposed by C. Roy et al.; (2) we discovered two clone clusters with three copies each from 29 student-submitted compiler lab projects; (3) we identified 174 code clone clusters and a potential bug from JDK 1.6 source files.
What problem does this paper attempt to address?