Abstract:Bug triaging refers to the process of assigning a bug to the most appropriate developer to fix. It becomes more and more difficult and complicated as the size of software and the number of developers increase. In this paper, we propose a new framework for bug triaging, which maps the words in the bug reports (i.e., the term space) to their corresponding topics (i.e., the topic space). We propose a specialized topic modeling algorithm named multi-feature topic model (MTM) which extends Latent Dirichlet Allocation (LDA) for bug triaging. MTM considers product and component information of bug reports to map the term space to the topic space. Finally, we propose an incremental learning method named TopicMiner which considers the topic distribution of a new bug report to assign an appropriate fixer based on the affinity of the fixer to the topics. We pair TopicMiner with MTM (TopicMiner$^{MTM}$ ). We have evaluated our solution on 5 large bug report datasets including GCC, OpenOffice, Mozilla, Netbeans, and Eclipse containing a total of 227,278 bug reports. We show that TopicMiner $^{MTM}$ can achieve top-1 and top-5 prediction accuracies of 0.4831-0.6868, and 0.7686-0.9084, respectively. We also compare TopicMiner$^{MTM}$ with Bugzie, LDA-KL, SVM-LDA, LDA-Activity, and Yang et al.'s approach. The results show that TopicMiner $^{MTM}$ on average improves top-1 and top-5 prediction accuracies of Bugzie by 128.48 and 53.22 percent, LDA-KL by 262.91 and 105.97 percent, SVM-LDA by 205.89 and 110.48 percent, LDA-Activity by 377.60 and 176.32 percent, and Yang et al.'s approach by 59.88 and 13.70 percent, respectively.

Tackling topic general words in topic modeling.

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

A Novel Topic Model for Automatic Term Extraction

Refine the Corpora Based on Document Manifold.

Improving Automated Bug Triaging with Specialized Topic Model.

Towards Generalising Neural Topical Representations

Modeling over Short Texts

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

A biterm topic model for short texts

Parsimonious Topic Models with Salient Word Discovery

Short Text Topic Modeling With Flexible Word Patterns

Dive into the Chasm: Probing the Gap between In- and Cross-Topic Generalization

Generative Topic Embedding: a Continuous Representation of Documents (Extended Version with Proofs)

BTM: Topic Modeling over Short Texts

Sys-TM: A Fast and General Topic Modeling System

Topic Modeling over Short Texts by Incorporating Word Embeddings

Topic Modeling in Semantic Space with Keywords.

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

Collaborative Topic Modeling for Text Tensors

Topics in the Haystack: Enhancing Topic Quality through Corpus Expansion

Identifying Objective and Subjective Words via Topic Modeling