MCodeSearcher: Multi-View Contrastive Learning for Code Search.

Jia Li,Fang Liu,Yunfei Zhao,Ge Li,Zhi Jin
DOI: https://doi.org/10.1145/3609437.3609456
2023-01-01
Abstract:Code search has been a critical software development activity in facilitating developers to retrieve a proper code snippet from open-source repositories given a user intent. In recent years, large-scale pre-trained models have shown impressive performance on code representation learning and have achieved state-of-the-art performance on code search task. However, it is challenging for these models to distinguish the functionally equivalent code snippets with dissimilar implementations or the non-equivalent code snippets that look similar. Due to the diversity of the code implementations, it is necessary for the code search engines to identify the functional similarities or dissimilarities of source code so as to return the functionally matched source code for a given query. Besides, existing pre-trained models mainly focus on learning the semantic representations of code snippets. The semantic correlation between the code snippet and natural language query is not sufficiently exploited. An effective code search tool not only needs to understand the relationship between queries and code snippets but also needs to identify the relationship between diversified code snippets. To address these limitations, we propose a novel multi-view contrastive learning model MCodeSearcher for code retrieval, aiming at sufficiently exploiting (1) the semantic correlation between queries and code snippets, and (2) the relationship between functionally equivalent code snippets. To achieve this, we design contrastive training objectives from three views and pre-train our model with these objectives. The experimental results on five representative code search datasets show that our approach significantly outperforms the state-of-the-art methods.
What problem does this paper attempt to address?