Just-in-time Defect Prediction Based on AST Change Embedding

Weiyuan Zhuang,Hao Wang,Xiaofang Zhang
DOI: https://doi.org/10.1016/j.knosys.2022.108852
IF: 8.139
2022-01-01
Knowledge-Based Systems
Abstract:Just-in-time (JIT) defect prediction can help developers quickly identify whether a change is defective or not. The features extracted from changes play an essential role in building an accurate prediction model. In recent years, it has been considered effective to extract the semantic features of software code files by using code representation technology. However, how to extract semantic information from broken changing code snippets is still a challenging problem. We propose a new feature to represent code semantics based on Abstract Syntax Trees (ASTs), called ACE (AST Change Embedding), by comparing the abstract syntax tree of source code before and after a change and extracting AST change sequences, and then mapping them into numeric vectors by using word embedding technology. At the same time, we utilize the gated mechanism to build a gated hierarchical model, called GH-ACE, to combine the traditional manual features and semantic features. We conduct experiments on within-project and cross-project defect prediction tasks and evaluate the effectiveness of our proposed model in non-effort-aware scenarios and effort-aware scenarios. The results show that, on average, our model is 4.0 percent higher than the best baseline method for within-project defect prediction and 2.4 percent higher than the best baseline method for cross-project defect prediction.
What problem does this paper attempt to address?