Are Duplicates Really Harmful? an Empirical Study on Bug Report Summarization Techniques

Rui Hao,Yuying Li,Yang Feng,Zhenyu Chen
DOI: https://doi.org/10.1002/smr.2424
2022-01-01
Journal of Software
Abstract:Recent research works have proven that duplicate bug reports can provide helpful information to assist developers in software tasks such as fault localization and program fixing, while thoroughly reading duplicate bug reports is time-consuming and inefficient. Summarization is a possible solution for gaining essential information quickly. However, there are many challenges when applying existing summarizing techniques on duplicate bug reports. Duplicate bug reports describe the same problem from different views and vary in quality, content, and writing style. Moreover, the code snippet understanding and the semantic gap between natural and programming languages make the summary generation even more difficult. Thus, in this paper, we want to investigate whether the state-of-the-art summarization approaches can overcome the resistance and generate an effective summary for duplicate bug reports. We collected more than 8,000 groups of duplicate reports from GitHub and labeled 60 groups with 149 reports manually for the evaluation. Results showed that although the existing summarization approaches can work on duplicate bug reports, there are significant differences between them when it comes to code snippet summarization. Moreover, several methods can be very sluggish for summarizing long bug reports. Our study provides insights and guidelines for choosing proper summarization approaches in different scenarios.
What problem does this paper attempt to address?