Graph Embedding Based Code Search in Software Project

Yanzhen Zou,Chunyang Ling,Zeqi Lin,Bing Xie
DOI: https://doi.org/10.1145/3275219.3275221
2018-01-01
Abstract:Source code search is one of the most important methods to study and reuse software project. Currently, natural language based code search mainly faces the following two challenges: 1) More accurate search results are required when software projects evolve to be more heterogeneous and complex. 2) The semantic relationships between code elements (classes, methods, etc.) need to be illustrated so that developers could better understand their usage scenarios. To deal with these issues, we propose a novel approach to searching a software project's source code based on graph embedding. First, we build a software project's code graph automatically from its source code and represent each code element in the code graph with graph embedding. Second, we search code graph with natural language questions, return corresponding subgraph that composed of relevant code elements and their associated relationships, as the best answer of the search question. In experiments, we select two famous open source projects, Apache Lucene and POI, as examples to perform source code search tasks. The experimental results show that our approach improves F1-score by 10% than existing shortest path based code graph search approach, while reduces average response time about 60 times.
What problem does this paper attempt to address?