Mining Issue Trackers: Concepts and Techniques

Lloyd Montgomery,Clara Lüders,Walid Maalej
2024-07-12
Abstract:An issue tracker is a software tool used by organisations to interact with users and manage various aspects of the software development lifecycle. With the rise of agile methodologies, issue trackers have become popular in open and closed-source settings alike. Internal and external stakeholders report, manage, and discuss "issues", which represent different information such as requirements and maintenance tasks. Issue trackers can quickly become complex ecosystems, with dozens of projects, hundreds of users, thousands of issues, and often millions of issue evolutions. Finding and understanding the relevant issues for the task at hand and keeping an overview becomes difficult with time. Moreover, managing issue workflows for diverse projects becomes more difficult as organisations grow, and more stakeholders get involved. To help address these difficulties, software and requirements engineering research have suggested automated techniques based on mining issue tracking data. Given the vast amount of textual data in issue trackers, many of these techniques leverage natural language processing. This chapter discusses four major use cases for algorithmically analysing issue data to assist stakeholders with the complexity and heterogeneity of information in issue trackers. The chapter is accompanied by a follow-along demonstration package with JupyterNotebooks.
Software Engineering
What problem does this paper attempt to address?
The paper attempts to address the following issues: With the widespread adoption of agile methodologies, issue trackers have become increasingly popular in both open-source and closed-source environments. However, as the complexity and scale of projects grow, a large amount of information accumulates in issue trackers, making it difficult to find and understand relevant issues, maintain an overview, and manage workflows across different projects. To tackle these problems, research in software and requirements engineering has proposed automated techniques based on mining issue tracker data. The paper discusses in detail four main use cases that utilize natural language processing (NLP) techniques to analyze issue data, helping stakeholders cope with the complexity and heterogeneity of information in issue trackers. Specifically, the paper aims to address the following problems: 1. **Information Overload**: As projects develop, a large number of issue reports accumulate in issue trackers, making it difficult for users to quickly find and understand relevant information. 2. **Information Complexity and Heterogeneity**: Issue trackers contain various types of information, such as requirements, tasks, maintenance, and user support. The diversity and complexity of this information increase the difficulty of management and understanding. 3. **Workflow Management**: As organizations grow and more stakeholders get involved, managing the issue workflows of different projects becomes more challenging. 4. **Data Quality**: The flexibility and low entry barrier of issue trackers may lead to redundant, ambiguous, or conflicting data, affecting the quality of issue reports, the accuracy of issue classification, and the efficiency of issue resolution. To address these issues, the paper explores how natural language processing techniques can be used to analyze issue data, including requirements quality analysis, issue evolution analysis, discussion analysis, and linking and traceability analysis. These techniques can help stakeholders better manage and utilize the information in issue trackers.