Abstract:Software applications, especially Enterprise Resource Planning (ERP) systems, are crucial to the day-to-day operations of many industries. Therefore, it is essential to maintain these systems effectively using tools that can identify, diagnose, and mitigate their incidents. One promising data-driven approach is the Subgroup Discovery (SD) technique, a data mining method that can automatically mine incident datasets and extract discriminant patterns to identify the root causes of issues. However, current SD solutions have limitations in handling complex target concepts with multiple attributes organized hierarchically. To illustrate this scenario, we examine the case of Java out-of-memory incidents among several possible applications. We have a dataset that describes these incidents, including their context and the types of Java objects occupying memory when it reaches saturation, with these types arranged hierarchically. This scenario inspires us to propose a novel Subgroup Discovery approach that can handle complex target concepts with hierarchies. To achieve this, we design a pattern syntax and a quality measure that ensure the identified subgroups are relevant, non-redundant, and resilient to noise. To achieve the desired quality measure, we use the Subjective Interestingness model that incorporates prior knowledge about the data and promotes patterns that are both informative and surprising relative to that knowledge. We apply this framework to investigate out-of-memory errors and demonstrate its usefulness in incident diagnosis. To validate the effectiveness of our approach and the quality of the identified patterns, we present an empirical study. The source code and data used in the evaluation are publicly accessible, ensuring transparency and reproducibility.

Mining Extremely Small Data Sets with Application to Software Reuse

Mining Noise-Tolerant Frequent Closed Itemsets in Very Large Database.

Interactive Rare-Category-of-Interest Mining from Large Datasets

Using Support Vector Machines for Mining Regression Classes in Large Data Sets

Data Mining for Software Engineering

Learning from Distribution-Changing Data Streams Via Decision Tree Model Reuse

Towards One Reusable Model for Various Software Defect Mining Tasks

A Fuzzy Neural Network for Data Mining: Dealing with the Problem of Small Disjuncts.

Mining Software Repository:A Survey

Application of Data Mining Technology in Software Engineering

Data Mining and Machine Learning for Software Engineering

Rare Event Prediction Using Similarity Majority Under-Sampling Technique

Mining Software Engineering Data

Software intelligence: the future of mining software engineering data.

FRI-Miner: Fuzzy Rare Itemset Mining

A Mining Approach to Obtain the Software Vulnerability Characteristics

Small data machine learning in materials science

A Local Approach and Comparison with Other Data Mining Approaches in Software Application

Leveraging Data Mining Algorithms to Recommend Source Code Changes

Mining Java Memory Errors using Subjective Interesting Subgroups with Hierarchical Targets