Understanding imbalanced data: XAI & interpretable ML framework

Damien Dablain,Colin Bellinger,Bartosz Krawczyk,David W. Aha,Nitesh Chawla
DOI: https://doi.org/10.1007/s10994-023-06414-w
IF: 5.414
2024-01-18
Machine Learning
Abstract:There is a gap between current methods that explain deep learning models that work on imbalanced image data and the needs of the imbalanced learning community. Existing methods that explain imbalanced data are geared toward binary classification, single layer machine learning models and low dimensional data. Current eXplainable Artificial Intelligence (XAI) techniques for vision data mainly focus on mapping predictions of specific instances to inputs, instead of examining global data properties and complexities of entire classes. Therefore, there is a need for a framework that is tailored to modern deep networks, that incorporates large, high dimensional, multi-class datasets, and uncovers data complexities commonly found in imbalanced data. We propose a set of techniques that can be used by both deep learning model users to identify, visualize and understand class prototypes, sub-concepts and outlier instances; and by imbalanced learning algorithm developers to detect features and class exemplars that are key to model performance. The components of our framework can be applied sequentially in their entirety or individually, making it fully flexible to the user's specific needs (https://github.com/dd1github/XAI_for_Imbalanced_Learning).
computer science, artificial intelligence
What problem does this paper attempt to address?
The paper primarily aims to address the interpretability issues of deep learning models, particularly Convolutional Neural Networks (CNNs), when dealing with imbalanced datasets. Specifically, the paper focuses on the following key issues: 1. **Limitations of Existing Explanation Techniques**: Current Explainable Artificial Intelligence (XAI) methods have limitations when handling imbalanced image data. These methods mainly focus on binary classification problems, single-layer machine learning models, and low-dimensional data. Most XAI techniques emphasize mapping the prediction of specific instances to the input rather than examining the characteristics and complexities of the entire class on a global scale. 2. **Characteristics of Imbalanced Datasets**: Imbalanced datasets exacerbate the entanglement of potential features, class overlap, and the impact of noisy instances on the classifier. Therefore, a method specifically designed for modern deep networks is needed, capable of handling large-scale, high-dimensional, multi-class datasets and revealing the common data complexities in imbalanced data. 3. **Need to Combine XAI and Imbalanced Learning**: Although both XAI and imbalanced learning emphasize the importance of interpretation, they address the problem from different perspectives. XAI usually focuses on the interpretability of the model, while imbalanced learning seeks to better understand data complexity. The paper attempts to combine elements from both fields to better understand CNN predictions on imbalanced data. To address the above challenges, the paper proposes the following contributions: - **High-Dimensional Imbalanced Data Understanding Framework**: Provides a set of tools that can effectively visualize crucial concepts in imbalanced learning, such as class prototypes, sub-concepts, and class overlap. - **Predicting Relative False Positives**: Demonstrates how to use training data to predict the most likely false positive classes for a given reference class during inference. - **Class-Salient Color Visualization**: Unlike existing black-and-white numerical heatmaps that only show pixel saliency for single data instances, the paper proposes a visualization method to display the most salient colors used by CNNs when recognizing entire classes. Through these contributions, the paper aims to provide a flexible framework and toolkit to help deep learning model users and developers better understand imbalanced data and its impact on model performance.