Abstract:Despite the high accuracy offered by state-of-the-art deep natural-language models (e.g., LSTM, BERT), their application in real-life settings is still widely limited, as they behave like a black-box to the end-user. Hence, explainability is rapidly becoming a fundamental requirement of future-generation data-driven systems based on deep-learning approaches. Several attempts to fulfill the existing gap between accuracy and interpretability have been made. However, robust and specialized eXplainable Artificial Intelligence solutions, tailored to deep natural-language models, are still missing. We propose a new framework, named T-EBAnO , which provides innovative prediction-local and class-based model-global explanation strategies tailored to deep learning natural-language models. Given a deep NLP model and the textual input data, T-EBAnO provides an objective, human-readable, domain-specific assessment of the reasons behind the automatic decision-making process. Specifically, the framework extracts sets of interpretable features mining the inner knowledge of the model. Then, it quantifies the influence of each feature during the prediction process by exploiting the normalized Perturbation Influence Relation index at the local level and the novel Global Absolute Influence and Global Relative Influence indexes at the global level. The effectiveness and the quality of the local and global explanations obtained with T-EBAnO are proved on an extensive set of experiments addressing different tasks, such as a sentiment-analysis task performed by a fine-tuned BERT model and a toxic-comment classification task performed by an LSTM model. The quality of the explanations proposed by T-EBAnO , and, specifically, the correlation between the influence index and human judgment, has been evaluated by humans in a survey with more than 4000 judgments. To prove the generality of T-EBAnO and its model/task-independent methodology, experiments with other models (ALBERT, ULMFit) on popular public datasets (Ag News and Cola) are also discussed in detail.

On Sample Based Explanation Methods for Sequence-to-Sequence Applications

On Sample Based Explanation Methods for NLP: Faithfulness, Efficiency and Semantic Evaluation

ReX: A Framework for Incorporating Temporal Information in Model-Agnostic Local Explanation Techniques

Explanation Space: A New Perspective into Time Series Interpretability

Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning

Robust explainer recommendation for time series classification

ReX: A Framework for Generating Local Explanations to Recurrent Neural Networks

Explanation as a process: user-centric construction of multi-level and multi-modal explanations

Recommendation with Dynamic Natural Language Explanations.

Explaining Black Box Predictions and Unveiling Data Artifacts through Influence Functions

Sample Enrichment via Temporary Operations on Subsequences for Sequential Recommendation

DeepSeer: Interactive RNN Explanation and Debugging via State Abstraction

Sequential Interpretability: Methods, Applications, and Future Direction for Understanding Deep Learning Models in the Context of Sequential Data

Sample based Explanations via Generalized Representers

Trusting deep learning natural-language models via local and global explanations

Self-Explaining Structures Improve NLP Models

T-Explainer: A Model-Agnostic Explainability Framework Based on Gradients

Learning from Explanations with Neural Execution Tree

Analyzing the Influence of Training Samples on Explanations