Code classification with graph neural networks: Have you ever struggled to make it work?

Qingchen Yu,Xin Liu,Qingguo Zhou,Jianwei Zhuge,Chunming Wu
DOI: https://doi.org/10.1016/j.eswa.2023.120978
IF: 8.5
2023-07-26
Expert Systems with Applications
Abstract:Code classification is a meaningful task with plenty of practical applications. Combined with recently popular graph neural networks (GNNs), a body of research attempts to address the code classification problem with the help of fruitful achievements from the deep learning (DL) area. We systematically investigate the practices of existing works on the GNN-based code classification task and find out three important but often overlooked questions. (1) Existing works usually illustrate the effectiveness of GNNs in code classification tasks by contrasting them with non-graph baselines. But the contribution of the message-passing mechanism, which is at the core of GNNs, has not been shown on its own. So how does the message-passing mechanism help code classification? (2) Various problem formulations, model architectures, and learning objectives have been suggested in the literature to apply GNNs to code classification. In practice, how should we make choices among them? (3) One of the most prominent applications of code classification is automated vulnerability detection. However, learning-based vulnerability detection has not been widely accepted as a practical approach. How does GNN-based code classification perform in this task, especially in real-world scenarios? A comprehensive experimental study is conducted around these questions to evaluate the performance and feasibility of the GNN on different datasets and scenarios. Results suggest it is not easy to make the GNN perform well on the code classification task in the face of the intricate nature of programs. Interesting findings about the effectiveness of GNNs with different training objectives are reported for the first time in this paper, providing insights that can inform future research on code classification with GNNs.
computer science, artificial intelligence,engineering, electrical & electronic,operations research & management science
What problem does this paper attempt to address?