Improving Fault Localization and Program Repair with Deep Semantic Features and Transferred Knowledge

Xiangxin Meng,Xu Wang,Hongyu Zhang,Hailong Sun,Xudong Liu
DOI: https://doi.org/10.1145/3510003.3510147
2022-01-01
Abstract:Automatic software debugging mainly includes two tasks of fault localization and automated program repair. Compared with the traditional spectrum-based and mutation-based methods, deep learning-based methods are proposed to achieve better performance for fault localization. However, the existing methods ignore the deep semantic features or only consider simple code representations. They do not leverage the existing bug-related knowledge from large-scale open-source projects either. In addition, existing template-based program repair techniques can incorporate project specific information better than deep-learning approaches. However, they are weak in selecting the fix templates for efficient program repair. In this work, we propose a novel approach called TRANSFER, which leverages the deep semantic features and transferred knowledge from open-source data to improve fault localization and program repair. First, we build two large-scale open-source bug datasets and design 11 BiLSTM-based binary classifiers and a BiLSTM-based multi-classifier to learn deep semantic features of statements for fault localization and program repair, respectively. Second, we combine semantic-based, spectrum-based and mutation-based features and use an MLP-based model for fault localization. Third, the semantic-based features are leveraged to rank the fix templates for program repair. Our extensive experiments on widely-used benchmark Defects4J show that TRANSFER outperforms all baselines in fault localization, and is better than existing deep-learning methods in automated program repair. Compared with the typical template-based work TBar, TRANSFER can correctly repair 6 more bugs (47 in total) on Defects4J.
What problem does this paper attempt to address?