Abstract:With the growing adoption of AI-based systems across everyday life, the need to understand their decision-making mechanisms is correspondingly increasing. The level at which we can trust the statistical inferences made from AI-based decision systems is an increasing concern, especially in high-risk systems such as criminal justice or medical diagnosis, where incorrect inferences may have tragic consequences. Despite their successes in providing solutions to problems involving real-world data, deep learning (DL) models cannot quantify the certainty of their predictions. These models are frequently quite confident, even when their solutions are incorrect. This work presents a method to infer prominent features in two DL classification models trained on clinical and non-clinical text by employing techniques from topological and geometric data analysis. We create a graph of a model's feature space and cluster the inputs into the graph's vertices by the similarity of features and prediction statistics. We then extract subgraphs demonstrating high-predictive accuracy for a given label. These subgraphs contain a wealth of information about features that the DL model has recognized as relevant to its decisions. We infer these features for a given label using a distance metric between probability measures, and demonstrate the stability of our method compared to the LIME and SHAP interpretability methods. This work establishes that we may gain insights into the decision mechanism of a DL model. This method allows us to ascertain if the model is making its decisions based on information germane to the problem or identifies extraneous patterns within the data.

What problem does this paper attempt to address?

The paper aims to address the issue of the lack of interpretability in Deep Learning (DL) models when applied in high-risk domains, particularly the inability of these models to quantify the uncertainty of their predictions and their tendency to provide high confidence even when predictions are incorrect. This "black-box" nature limits the application of deep learning models in critical areas such as criminal justice and medical diagnosis, where erroneous inferences can have severe consequences. To tackle this problem, the authors propose a method based on Topological Data Analysis (TDA) to enhance the interpretability of deep learning models. This method identifies significant features by constructing a graph representation of the model's feature space and clusters input data into vertices with similar features and predictive statistical properties. By extracting subgraphs with high predictive accuracy, rich information related to the model's decisions can be obtained. Specifically, the contributions of this method include: 1. Proposing an interpretability method for any AI model that can construct a topologically correct representation of the model's feature space and provide feature insights from both global and local perspectives. 2. Using the Mapper algorithm to create a low-dimensional representation of the feature space, which clusters input data into different regions based on feature similarity and prediction accuracy. 3. Demonstrating the effectiveness of the proposed method through experimental results on two datasets (a multi-task convolutional neural network [MTCNN] for information extraction tasks from cancer pathology reports, and a convolutional neural network [CNN] trained on the public 20newsgroups dataset), and showing that the method is more stable compared to existing interpretability methods such as LIME and SHAP. In summary, the goal of this paper is to increase trust in AI model predictions by providing a new interpretability framework, especially in fields requiring high reliability, such as medical diagnosis.

Topological Interpretability for Deep-Learning

A Comprehensive Review of Deep Neural Network Interpretation Using Topological Data Analysis

A Survey of the Interpretability Aspect of Deep Learning Models

Relevance Inference Based on Direct Contribution: Counterfactual Explanation to Deep Networks for Intelligent Decision-making

Geometric and Topological Inference for Deep Representations of Complex Networks

Towards Understanding Sensitive and Decisive Patterns in Explainable AI: A Case Study of Model Interpretation in Geometric Deep Learning

Interpretability of deep learning models: A survey of results

Physics-Inspired Interpretability Of Machine Learning Models

Topological structure of complex predictions

Interpreting Deep Learning Models for Knowledge Tracing

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

Experimental Observations of the Topology of Convolutional Neural Network Activations

Explainable Deep Learning: A Visual Analytics Approach with Transition Matrices

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

MOUNTAINEER: Topology-Driven Visual Analytics for Comparing Local Explanations

Discovering interpretable models of scientific image data with deep learning

Explainable deep learning in healthcare: A methodological survey from an attribution view

Interpretable Deep Learning under Fire

Topological deep learning: a review of an emerging paradigm

A Detailed Study of Interpretability of Deep Neural Network based Top Taggers

Visual Interpretability forDeepLearning