Contrastive Learning of Functionality-Aware Code Embeddings

Yiyang Li,Hongqiu Wu,Hai Zhao
DOI: https://doi.org/10.1109/icassp49357.2023.10096337
2023-01-01
Abstract:Using pre-trained language models to obtain code embeddings is a common and effective practice in the field of source code comprehension. However, language models pre-trained on natural language text fail to capture some intrinsic characteristics of code snippets since programming languages are in a quite different form from natural language. In this paper, we present Functionality-aware Code Embeddings (FaCE) in terms of contrastive learning. The key idea of this work is that when comprehending a code snippet, it is the functionality that counts rather than its semantic meaning that mainly comes from its entities (e.g. names of functions, variables and classes). We construct positive samples and hard negative samples according to the functionality of code snippets, then pre-train our model by standard contrastive learning framework. Experimental results and massive analysis on two code-related benchmarks have justified the effectiveness of our proposed FaCE by outperforming the baseline models with large margins.
What problem does this paper attempt to address?