An Empirical Study on Code Search Pre-trained Models: Academic Progresses Vs. Industry Requirements

Kuo Chi,Chuanyi Li,Jidong Ge,Bin Luo
DOI: https://doi.org/10.1145/3671016.3672580
2024-01-01
Abstract:With the rapid development of pre-trained source code models, code search has made fundamental advances. However, a thorough evaluation of the effectiveness of academic code search models in addressing the need of the industry is overlooked. We propose to conduct a ground-breaking evaluation of existing code search models w.r.t. their adaptability, scalability, robustness, and semantic sensitivity. First, we evaluate the influence of the queries' semantic attributes on searching performance extensively, and design strategies to reduce the impact of the incomplete semantics. Then, we use variants of queries to test the model's adaptability and robustness. Next, we classify queries to different search purposes to determine the cross-type searching applicability of the models. Finally, we measure the effects of multilingual efficient fine-tuning on the model performance, and provide a reliable way to reduce the costs of developing and deploying code search systems for the industry. These contribute well to narrowing the gap between academic progress and industry requirements of code search.
What problem does this paper attempt to address?