Duplication Detection for Software Bug Reports Based on Topic Model

Jie Zou,Ling Xu,Mengning Yang,Meng Yan,Dan Yang,Xiaohong Zhang
DOI: https://doi.org/10.1109/icss.2016.16
2016-01-01
Abstract:The traditional duplicate bug reports detection approaches are usually based on vector space model. However, the experimental result is rarely satisfying since this method cannot distinguish semantic correlation among bug reports which written by natural languages. Topic model, as a method to model underlying topics of texts, can solve the problem of document similarity calculation methods used in the information retrieving. It can find the semantic topics among the texts through massive training data, and obtain semantic relatedness among documents. Therefore, this paper proposes a novel duplication detection method based on topic model. Through selecting bug reports with execution information and combing with classified information of bugs, not only does this new method overcome the problem of high dimension, sparse data and loud noise, but also avoid the problem of synonymy and ambiguity in the natural languages. Comparing to the traditional SVM method, the recall rate and precision rate of our proposed approach have obviously increased, which indicates the effectiveness of this new method.
What problem does this paper attempt to address?