How to cherry pick the bug report for better summarization?
Haoran Liu,Yue Yu,Shanshan Li,Mingyang Geng,Xiaoguang Mao,Xiangke Liao
DOI: https://doi.org/10.1007/s10664-021-10008-2
IF: 3.762
2021-09-03
Empirical Software Engineering
Abstract:Bug reports, as a frequently consulted software asset, are maintained and evolved in software communities. A large number of bug reports with complex discussions are accumulated during the software evolution. It has been proven that an accurate and concise summary can help developers reduce the time effort spent going through the entire content of bug reports. Prior works select salient sentences that contain the most semantic information to form summaries. Their performance is limited due to the lack of consideration of controversial standpoints among developers' comments and the redundancy in sentences. In this paper, we study the possibility of assessing comments' opinions from discussions, and which kind of sentences are more likely to have redundant information. Based on these studies, we propose two new factors, Believability and Informativeness. The former measures the degree of approved or disapproved to a sentence within discussions, and the latter assesses the amount of information contained in the summary. Accordingly, we design BugSum, a supervised approach to generate summaries with a two-phase method. In the measuring phase, we propose a classification method that combines the advantages of Deep Pyramid CNN and Random Forest to assess the believability of sentences in bug reports. In the selection phase, BugSum integrates an auto-encoder network for semantic feature extraction with the believability of sentences, and optimizes the informativeness of generated summaries through a dynamic selection of salient sentences. Extensive experiments show that our approach outperforms 8 comparative approaches over two public datasets and one customized dataset. In particular, the probability of adding controversial sentences that are clearly disapproved by other developers into the summary is reduced by up to 64.7%.
computer science, software engineering