Assessing and Improving Dataset and Evaluation Methodology in Deep Learning for Code Clone Detection

Haiyang Li,Qing Gao,Shikun Zhang
DOI: https://doi.org/10.1109/issre59848.2023.00044
2023-01-01
Abstract:Code clone detection is a task that identifies whether two code snippets are semantically identical. In recent years, deep learning models have shown high performance in detecting Type-3 and Type-4 code clones, and received increasing attention from the research community. However, compared with the attention given to the model design by the researchers, there is little research work on the quality of the datasets and the evaluation methodology (the way of dividing the dataset into training set and test set), which poses a challenge to the credibility of deep learning models.In this paper, we conduct experiments to evaluate the performance of the existing state-of-the-art models in multi-perspectives. At the same time, we release two new datasets for code clone detection, namely ConBigCloneBench and Google-CodeJam2 based on the existing datasets BigCloneBench and GoogleCodeJam, respectively. Our experiments show that the performance of the same model decreases up to 0.5 F1 score (from 0.9 to 0.4) on different evaluation perspectives and datasets, and the performance of some models is only similar to the simple MLP model. We analyze reasons for the performance decline further, and provide suggestions for future research to improve the performance of deep learning models from multi-perspectives.
What problem does this paper attempt to address?