Software visualization and deep transfer learning for effective software defect prediction

Jinyin Chen,Keke Hu,Yue Yu,Zhuangzhi Chen,Qi Xuan,Yi Liu,Vladimir Filkov
DOI: https://doi.org/10.1145/3377811.3380389
2020-06-27
Abstract:Software defect prediction aims to automatically locate defective code modules to better focus testing resources and human effort. Typically, software defect prediction pipelines are comprised of two parts: the first extracts program features, like abstract syntax trees, by using external tools, and the second applies machine learning-based classification models to those features in order to predict defective modules. Since such approaches depend on specific feature extraction tools, machine learning classifiers have to be custom-tailored to effectively build most accurate models. To bridge the gap between deep learning and defect prediction, we propose an end-to-end framework which can directly get prediction results for programs without utilizing feature-extraction tools. To that end, we first visualize programs as images, apply the self-attention mechanism to extract image features, use transfer learning to reduce the difference in sample distributions between projects, and finally feed the image files into a pre-trained, deep learning model for defect prediction. Experiments with 10 open source projects from the PROMISE dataset show that our method can improve cross-project and within-project defect prediction. Our code and data pointers are available at https://zenodo.org/record/3373409#.XV0Oy5Mza35.
What problem does this paper attempt to address?