Learning Semantic Features for Software Defect Prediction by Code Comments Embedding

Xuan Huo,Yang,Ming Li,De-Chuan Zhan
DOI: https://doi.org/10.1109/icdm.2018.00133
2018-01-01
Abstract:Software Quality Assurance (SQA) is essential in software development and many defect prediction methods based on machine learning have been proposed to identify defective modules. However, most existing defect prediction models do not provide good defect prediction results, and the semantic features reflecting the detective patterns may not be well-captured via traditional feature extraction methods. More information such as code comments should be also be embedded to generate semantic features respecting the source code functionality. Therefore, how to embed code comments for defect prediction is a big challenge, and another problem is that many comments of source code are missing in real-world applications. In this paper, we propose a novel defect prediction model named CAP-CNN (Convolutional Neural Network for Comments Augmented Programs), which is a deep learning model that automatically embeds code comments in generating semantic features from the source code for software defect prediction. To overcome the missing comments problem, a novel training strategy is used in CAP-CNN that the network encodes and absorb comments information to generate semantic features automatically during training process, which does not need testing modules to contain comments. Experimental results on several widely-used software data sets indicate that the comment features are able to improve defect prediction performance.
What problem does this paper attempt to address?