Abstract:The Transformer models have achieved unprecedented breakthroughs in text classification, and have become the foundation of most state-of-the-art NLP systems. The core function that drives the success is the attention mechanism, which provides the ability to dynamically focus on different parts of the input sequence when producing the predictions. Several previous works have investigated the usage of attention weights to explain the model predictions, because intuitively, attention weights reflect the importance of the input positions in the output. Specifically, the objective for explanation is to compute a relevance score for each input token, such that the key input words that are most important to the prediction can be identified. However, previous efforts produced mixed results. We find that the key reason why attention weights cannot be directly used as effective relevance indications is because they do not contain the directional information for relevance (i.e., whether the input tokens contribute towards or against the prediction). We then propose two novel explanation techniques, namely AGrad and RePAGrad, that produce directional relevance scores based on attention weights. To evaluate the explanation performance, we propose three properties that an effective explanation method should satisfy (i.e., faithfulness, resilience, and consistency), and design the corresponding test to quantify each property. Through extensive evaluations with Transformer models and pre-trained BERT models on multiple public text classification datasets, we show that AGrad and RePAGrad significantly outperform existing state-of-the-art explanation methods in faithfulness and consistency, at the cost of nominal degradation on resilience compared to attention weights. In addition, we reveal that elements of a model architecture can play an important role towards explainability.

Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification

Generating Plausible Counterfactual Explanations for Deep Transformers in Financial Text Classification

A user-centered explainable artificial intelligence approach for financial fraud detection

A Comprehensive Review on Financial Explainable AI

Generating Counterfactual Explanations with Natural Language

Explainable AI for Interpretable Credit Scoring

Investigating Explainability Methods in Recurrent Neural Network Architectures for Financial Time Series Data

Towards Explainable Artificial Intelligence in Banking and Financial Services

Deceptive AI Explanations: Creation and Detection

Explainable AI in Credit Risk Management

A Hypothesis on Good Practices for AI-based Systems for Financial Time Series Forecasting: Towards Domain-Driven XAI Methods

Model Interpretation and Explainability: Towards Creating Transparency in Prediction Models

NoMatterXAI: Generating "No Matter What" Alterfactual Examples for Explaining Black-Box Text Classification Models

Deceptive XAI: Typology, Creation and Detection

Explaining Explanations: An Overview of Interpretability of Machine Learning

Explaining AI in Finance: Past, Present, Prospects

Towards Responsible AI for Financial Transactions

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

On Exploring Attention-based Explanation for Transformer Models in Text Classification

Explain To Decide: A Human-Centric Review on the Role of Explainable Artificial Intelligence in AI-assisted Decision Making

A Survey of Explainable Artificial Intelligence (XAI) in Financial Time Series Forecasting