AYAT: A Lightweight and Efficient Code Clone Detection Technique

Yasir Giani,Luo Ping,Syed Asad Shah
DOI: https://doi.org/10.1109/accc58361.2022.00015
2022-01-01
Abstract:Code clones make software maintenance more challenging. Detecting bugs in large systems may significantly increase maintenance costs. Despite the fact that several techniques for clone identification have been proposed over the years, the accuracy and scalability of clone detection techniques remain hot research areas. Previously, Akram et al. proposed the DroidCC hybrid technique, where tokens were encoded into MD5 hash values by encoding them into 128-bit fingerprints, and clones were identified by matching identical hash values. Encoding tokens into MD5 hash values take more time due to the large fingerprint size of MD5 hash values. Due to the enormous chunk size, DroidCC cannot achieve higher accuracy. To overcome the weakness of the DroidCC technique, We proposed a novel AYAT a lightweight hybrid technique to detect clones at the fragment level. To speed up the detection process, we converted tokens into 32-bit polynomial values, and we set the chunk size to 5 lines per chunk to improve accuracy. We tested our technique on 10,968 java projects against 4.98 million lines of code. In comparison to the well-known DroidCC technique, it is significantly faster and more efficient. Our examination demonstrates that precision is significantly improved despite sacrificing scalability. AYAT code cloning detection technique has outscored DroidCC in every aspect.
What problem does this paper attempt to address?