Abstract:Bug assignment, or bug triage, focuses on identifying the appropriate developers to repair newly discovered bugs, thereby managing them more effectively. Several deep learning-based approaches have been proposed for automated bug assignment. These approaches view automated bug assignment as a text classification task - the textual description of a bug report is utilized as the input and the potential fixers are regarded as the output labels. Such approaches typically depend on the classification performance of natural language processing and machine learning techniques. Various word embedding and deep learning models have emerged continuously. The effectiveness of those approaches depends on the chosen deep learning model, used for classification, and the word embedding model, used for representing bug reports. However, prior research does not empirically evaluate the impacts of various word embedding and deep learning models for automated bug assignment. In this paper, we conduct an empirical study to analyze the performance variations among 35 deep learning-based automated bug assignment approaches. These approaches are based on five word embedding techniques, i.e. , Word2Vec, GloVe, NextBug, ELMo, and BERT, and seven text classification models, i.e. , TextCNN, LSTM, Bi-LSTM, LSTM with attention, Bi-LSTM with attention, MLP, and Naive Bayes. We evaluated these combinations across three benchmark datasets, namely Eclipse JDT, GCC, and Firefox, and their mergence i.e., a cross-project dataset. Our main observations are: (1) Bi-LSTM with attention and Bi-LSTM using ELMo are significantly superior to other deep learning models on bug assignment tasks in terms of top-k (k = 1, 5, 10) accuracy and MRR; (2) Both the summary and description of bug reports are useful for bug assignment, but the description is more useful than the summary; (3) The training corpus for word embedding models has a significant impact on the performance of deep learning-based bug assignment methods. Our results show the importance of tuning different components (e.g. word embedding model, classification model, and textual input) in deep learning-based automated bug assignment methods and provide important insights for practitioners and researchers.

How Do Injected Bugs Affect Deep Learning?

Towards Enhancing the Reproducibility of Deep Learning Bugs: An Empirical Study

Silent Bugs in Deep Learning Frameworks: An Empirical Study of Keras and TensorFlow

An Empirical Study on TensorFlow Program Bugs

Toward Understanding Deep Learning Framework Bugs

An Empirical Study of Bugs in Machine Learning Systems

Automated Identification of High Impact Bug Reports Leveraging Imbalanced Learning Strategies

Characterizing Performance Bugs in Deep Learning Systems

Understanding Bugs in Multi-Language Deep Learning Frameworks

On Reporting Performance and Accuracy Bugs for Deep Learning Frameworks: An Exploratory Study from GitHub

Towards Understanding the Challenges of Bug Localization in Deep Learning Systems

An Empirical Study on Bugs Inside PyTorch: A Replication Study

Demystifying Dependency Bugs in Deep Learning Stack

High-Impact Bug Report Identification with Imbalanced Learning Strategies

Manifesting Bugs in Machine Learning Code: An Explorative Study with Mutation Testing

Gdefects4dl: A Dataset of General Real-World Deep Learning Program Defects

On the Effectiveness of Deep Vulnerability Detectors to Simple Stupid Bug Detection.

Gdefects4dl

A Comparative Study of Supervised Learning Algorithms for Re-opened Bug Prediction.

How about Bug-Triggering Paths? - Understanding and Characterizing Learning-Based Vulnerability Detectors

An empirical assessment of different word embedding and deep learning models for bug assignment