Abstract:An issue tracker is a software tool used by organisations to interact with users and manage various aspects of the software development lifecycle. With the rise of agile methodologies, issue trackers have become popular in open and closed-source settings alike. Internal and external stakeholders report, manage, and discuss "issues", which represent different information such as requirements and maintenance tasks. Issue trackers can quickly become complex ecosystems, with dozens of projects, hundreds of users, thousands of issues, and often millions of issue evolutions. Finding and understanding the relevant issues for the task at hand and keeping an overview becomes difficult with time. Moreover, managing issue workflows for diverse projects becomes more difficult as organisations grow, and more stakeholders get involved. To help address these difficulties, software and requirements engineering research have suggested automated techniques based on mining issue tracking data. Given the vast amount of textual data in issue trackers, many of these techniques leverage natural language processing. This chapter discusses four major use cases for algorithmically analysing issue data to assist stakeholders with the complexity and heterogeneity of information in issue trackers. The chapter is accompanied by a follow-along demonstration package with JupyterNotebooks.

What problem does this paper attempt to address?

The paper attempts to address the following issues: With the widespread adoption of agile methodologies, issue trackers have become increasingly popular in both open-source and closed-source environments. However, as the complexity and scale of projects grow, a large amount of information accumulates in issue trackers, making it difficult to find and understand relevant issues, maintain an overview, and manage workflows across different projects. To tackle these problems, research in software and requirements engineering has proposed automated techniques based on mining issue tracker data. The paper discusses in detail four main use cases that utilize natural language processing (NLP) techniques to analyze issue data, helping stakeholders cope with the complexity and heterogeneity of information in issue trackers. Specifically, the paper aims to address the following problems: 1. **Information Overload**: As projects develop, a large number of issue reports accumulate in issue trackers, making it difficult for users to quickly find and understand relevant information. 2. **Information Complexity and Heterogeneity**: Issue trackers contain various types of information, such as requirements, tasks, maintenance, and user support. The diversity and complexity of this information increase the difficulty of management and understanding. 3. **Workflow Management**: As organizations grow and more stakeholders get involved, managing the issue workflows of different projects becomes more challenging. 4. **Data Quality**: The flexibility and low entry barrier of issue trackers may lead to redundant, ambiguous, or conflicting data, affecting the quality of issue reports, the accuracy of issue classification, and the efficiency of issue resolution. To address these issues, the paper explores how natural language processing techniques can be used to analyze issue data, including requirements quality analysis, issue evolution analysis, discussion analysis, and linking and traceability analysis. These techniques can help stakeholders better manage and utilize the information in issue trackers.

Mining Issue Trackers: Concepts and Techniques

Improving Automated Bug Triaging with Specialized Topic Model.

Matminer: an Open Source Toolkit for Materials Data Mining

Improved Management of Issue Dependencies in Issue Trackers of Large Collaborative Projects

HisTrace: A system for mining on news-related articles instead of web pages

Mining Reviews in Open Source Code for Developers Trail: A Process Mining Approach

Software intelligence: the future of mining software engineering data.

Mining Treatment-Outcome Constructs from Sequential Software Engineering Data

Analysis of Duplicate Issue Reports for Issue Tracking System

TopicTracker - An advanced software pipeline for text mining on PubMed data: Bridging the gap between off-the-shelf tools and code based approaches

Mining Software Engineering Data

Mining Cohesive Domain Topics from Source Code

Bridging Text Visualization and Mining: A Task-Driven Survey

Multi-extract and Multi-level Dataset of Mozilla Issue Tracking History

Understanding On-Site Inspection Of Construction Projects Based On Keyword Extraction And Topic Modeling

Modeling and Analyzing Release Trajectory based on the Process of Issue Tracking

Mining crowd sourcing repositories for open innovation in software engineering

An Exploratory Study on Architectural Knowledge in Issue Tracking Systems

Tech mining: a revisit and navigation

Time series data mining for railway wheel and track monitoring: a survey

Mining API Usage Scenarios from Stack Overflow