Abstract:RNN models have achieved the state-of-the-art performance in a wide range of text mining tasks. However, these models are often regarded as black-boxes and are criticized due to the lack of interpretability. In this paper, we enhance the interpretability of RNNs by providing interpretable rationales for RNN predictions. Nevertheless, interpreting RNNs is a challenging problem. Firstly, unlike existing methods that rely on local approximation, we aim to provide rationales that are more faithful to the decision making process of RNN models. Secondly, a flexible interpretation method should be able to assign contribution scores to text segments of varying lengths, instead of only to individual words. To tackle these challenges, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into additive contribution of each word in the input text. This additive decomposition enables REAT to further obtain phrase-level attribution scores. In addition, REAT is generally applicable to various RNN architectures, including GRU, LSTM and their bidirectional versions. Experimental results demonstrate the faithfulness and interpretability of the proposed attribution method. Comprehensive analysis shows that our attribution method could unveil the useful linguistic knowledge captured by RNNs. Some analysis further demonstrates our method could be utilized as a debugging tool to examine the vulnerability and failure reasons of RNNs, which may lead to several promising future directions to promote generalization ability of RNNs.

A Hierarchical Explanation Generation Method Based on Feature Interaction Detection.

Generating Hierarchical Explanations on Text Classification Without Connecting Rules

Towards Hierarchical Importance Attribution: Explaining Compositional Semantics for Neural Sequence Models

Towards hierarchical importance attribu-

PE: A Poincare Explanation Method for Fast Text Hierarchy Generation

A Novel Feature Decomposition Method To Develop Multi-Hierarchy Model

Asymmetric feature interaction for interpreting model predictions

Integrating Hierarchical Semantic into Iterative Generation Model for Entailment Tree Explanation

Deeply Explain CNN Via Hierarchical Decomposition

Hierarchical Aspect-guided Explanation Generation for Explainable Recommendation

Multi-Level Explanations for Generative Language Models

Explaining Black-box Model Predictions via Two-level Nested Feature Attributions with Consistency Property

Latent Concept-based Explanation of NLP Models

A Unified Framework for Input Feature Attribution Analysis

Explainable Recommendation Through Attentive Multi-View Learning.

Stratified GNN Explanations through Sufficient Expansion

Hierarchical Interpretation of Neural Text Classification

Explaining Language Models' Predictions with High-Impact Concepts

Leveraging Local Structure for Improving Model Explanations: An Information Propagation Approach

On Attribution of Recurrent Neural Network Predictions via Additive Decomposition

Provably Better Explanations with Optimized Aggregation of Feature Attributions