Abstract:Despite the high accuracy offered by state-of-the-art deep natural-language models (e.g., LSTM, BERT), their application in real-life settings is still widely limited, as they behave like a black-box to the end-user. Hence, explainability is rapidly becoming a fundamental requirement of future-generation data-driven systems based on deep-learning approaches. Several attempts to fulfill the existing gap between accuracy and interpretability have been made. However, robust and specialized eXplainable Artificial Intelligence solutions, tailored to deep natural-language models, are still missing. We propose a new framework, named T-EBAnO , which provides innovative prediction-local and class-based model-global explanation strategies tailored to deep learning natural-language models. Given a deep NLP model and the textual input data, T-EBAnO provides an objective, human-readable, domain-specific assessment of the reasons behind the automatic decision-making process. Specifically, the framework extracts sets of interpretable features mining the inner knowledge of the model. Then, it quantifies the influence of each feature during the prediction process by exploiting the normalized Perturbation Influence Relation index at the local level and the novel Global Absolute Influence and Global Relative Influence indexes at the global level. The effectiveness and the quality of the local and global explanations obtained with T-EBAnO are proved on an extensive set of experiments addressing different tasks, such as a sentiment-analysis task performed by a fine-tuned BERT model and a toxic-comment classification task performed by an LSTM model. The quality of the explanations proposed by T-EBAnO , and, specifically, the correlation between the influence index and human judgment, has been evaluated by humans in a survey with more than 4000 judgments. To prove the generality of T-EBAnO and its model/task-independent methodology, experiments with other models (ALBERT, ULMFit) on popular public datasets (Ag News and Cola) are also discussed in detail.

Explaining neural networks without access to training data

Which Neural Network Makes More Explainable Decisions? an Approach Towards Measuring Explainability

Neural Reasoning Networks: Efficient Interpretable Neural Networks With Automatic Textual Explanations

What is Interpretability?

Explainable Neural Networks: Achieving Interpretability in Neural Models

Interpretability in Graph Neural Networks

Sensitivity based Neural Networks Explanations

Towards Explainable Neural-Symbolic Visual Reasoning

XInsight: Revealing Model Insights for GNNs with Flow-based Explanations

Explainability as statistical inference

A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts

GNNExplainer: Generating Explanations for Graph Neural Networks

Disentangled Explanations of Neural Network Predictions by Finding Relevant Subspaces

A Survey on Neural Network Interpretability

Improving Network Interpretability via Explanation Consistency Evaluation

Scalable Partial Explainability in Neural Networks via Flexible Activation Functions

Explaining Deep Graph Networks via Input Perturbation

Trusting deep learning natural-language models via local and global explanations

Explaining Deep Neural Networks by Leveraging Intrinsic Methods

Global Concept-Based Interpretability for Graph Neural Networks via Neuron Analysis

Explaining Hypergraph Neural Networks: From Local Explanations to Global Concepts