Code Clone Detection Based on Contrastive Learning

Chunli Xie,Yao Liang,Quanrun Lv,Zexuan Wan
DOI: https://doi.org/10.1109/seai62072.2024.10674596
2024-01-01
Abstract:In recent years, an increasing number of deep learning-based code representation learning techniques have achieved great success in the field of program analysis. These methods require large scale labeled samples for training the model. Due to the costly and time-consuming manual labeling, a large amount of data in open-source platforms lacks labels or has a few labels, which leads to the limitation of the effectiveness and practicality of the existing methods. To solve this problem, we propose a method based on contrastive learning to learn source code representation under small sample, that uses gated recurrent unit to extract code features and apply it to the code clone detection. Experimental results show that the method outperforms the state-of-the-art methods by 8% with respect to F1 score on benchmark dataset.
What problem does this paper attempt to address?