Deep Semantics-Enhanced Neural Code Search
Ying Yin,Longfei Ma,Yuqi Gong,Yucen Shi,Fazal Wahab,Yuhai Zhao
DOI: https://doi.org/10.3390/electronics13234704
IF: 2.9
2024-11-30
Electronics
Abstract:Code search uses natural language queries to retrieve code snippets from a vast database, identifying those that are semantically similar to the query. This enables developers to reuse code and enhance software development efficiency. Most existing code search algorithms focus on capturing semantic and structural features by learning from both text and code graph structures. However, these algorithms often struggle to capture deeper semantic and structural features within these sources, leading to lower accuracy in code search results. To address this issue, this paper proposes a novel semantics-enhanced neural code search algorithm called SENCS, which employs graph serialization and a two-stage attention mechanism. First, the code program dependency graph is transformed into a unique serialized encoding, and a bidirectional long short-term memory (LSTM) model is used to learn the structural information of the code in the graph sequence to generate code vectors rich in structural features. Second, a two-stage attention mechanism enhances the embedded vectors by assigning different weight information to various code features during the code feature fusion phase, capturing significant feature information from different code feature sequences, resulting in code vectors rich in semantic and structural information. To validate the performance of the proposed code search algorithm, extensive experiments were conducted on two widely used code search datasets, CodeSearchNet and JavaNet. The experimental results show that the proposed SENCS algorithm improves the average code search accuracy metrics by 8.30 % (MRR) and 17.85% (DCG) and compared to the best baseline code search model in the literature, with an average improvement of 14.86% in the SR@1 metric. Experiments with two open-source datasets demonstrate SENCS achieves a better search effect than state of-the-art models.
engineering, electrical & electronic,computer science, information systems,physics, applied