Hierarchical Semantic Graph Construction and Pooling Approach for Cross-language Code Retrieval

Mingyang Geng,Dezun Dong,Pingjing Lu
DOI: https://doi.org/10.1109/qrs-c60940.2023.00020
2023-01-01
Abstract:In today's diverse programming landscape, developers often face the challenge of implementing identical functionalities across multiple programming languages for various versions of a software system. This necessitates automated solutions to facilitate cross-language code-to-code retrieval, thereby reducing costs and enhancing developer productivity. Existing methods for this task tend to overlook the intricate semantics and inherent hierarchies present in source code. In this paper, we introduce HERONPO, a novel approach that employs a hierarchical graph construction and pooling mechanism tailored for the cross-language code-to-code retrieval task. Our methodology commences with the utilization of compilers to derive the Intermediate Representation (IR) from source codes. Subsequently, we architect our hierarchical variable-centric flow graphs, which adeptly transform the IR into a detailed hierarchical semantic graph. This transformation ensures a robust and learnable representation of code semantics. To ensure seamless end-to-end training of our hierarchical graph, we meticulously devise a pooling mechanism that aligns with the unique attributes of our proposed hierarchical graph. We assess the efficacy of HERONPO through experiments on a publicly available dataset comprising 78K solutions from programming contests, spanning Java, Python, and C languages. The experimental results are promising: HERONPO demonstrates a significant improvement, achieving up to 10% Mean Reciprocal Rank (MRR) enhancement across all Java-C, Java-Python, C-Python, C-Java, Python-J Java, and Python-C retrieval tasks.
What problem does this paper attempt to address?