Toward Interpretable Graph Tensor Convolution Neural Network for Code Semantics Embedding.

Jia Yang,Cai Fu,Fengyang Deng,Ming Wen,Xiaowei Guo,Chuanhao Wan
DOI: https://doi.org/10.1145/3582574
IF: 3.685
2023-01-01
ACM Transactions on Software Engineering and Methodology
Abstract:Intelligent deep learning-based models have made significant progress for automated source code semantics embedding, and current research works mainly leverage natural language-based methods and graph-based methods. However, natural language-based methods do not capture the rich semantic structural information of source code, and graph-based methods do not utilize rich distant information of source code due to the high cost of message-passing steps. In this article, we propose a novel interpretable model, called graph tensor convolution neural network (GTCN), to generate accurate code embedding, which is capable of comprehensively capturing the distant information of code sequences and rich code semantics structural information. First, we propose to utilize a high-dimensional tensor to integrate various heterogeneous code graphswith node sequence features, such as control flow, data flow. Second, inspired by the current advantages of graph-based deep learning and efficient tensor computations, we propose a novel interpretable graph tensor convolution neural network for learning accurate code semantic embedding from the code graph tensor. Finally, we evaluate three popular applications on the GTCN model: variable misuse detection, source code prediction, and vulnerability detection. Compared with current state-of-the-art methods, our model achieves higher scores with respect to the top-1 accuracy while costing less training time.
What problem does this paper attempt to address?