Software Defect Prediction via Transformer

Qihang Zhang,Bin Wu
DOI: https://doi.org/10.1109/ITNEC48623.2020.9084745
2020-06-01
Abstract:In order to enhance software reliability, software defect prediction is used to predict potential defects and to improve efficiency of software examination. Traditional defect prediction methods mainly focus on design static code metrics, and building machine learning classifiers to predict pieces of code that potentially defective. However, these manual extracted features do not contain syntactic and semantic information of programs. These information is much more important than those metrics and can improve the accuracy of defect prediction. In this paper, we propose a framework called software defect prediction via transformer (DP-Transformer) which capture syntactic and semantic features from programs and use them to improve defect prediction. Specifically, we first parse source code into ASTs and then select representative nodes from ASTs to form token vectors. Then we employ mapping and word embedding to convert token vectors into numerical vectors and send the numerical vectors to transformer. Transformer will automatically extract syntactic and semantic features and eventually feed these features into a Logistic Regression classifier. We evaluate our method on seven open-source Java projects with certain labels and take F-measure as evaluation criteria. The experimental results show that averagely, the proposed DP-Transformer improves the state-of-art method by 8%.
Computer Science
What problem does this paper attempt to address?