Abstract:RNN models have achieved the state-of-the-art performance in a wide range of text mining tasks. However, these models are often regarded as black-boxes and are criticized due to the lack of interpretability. In this paper, we enhance the interpretability of RNNs by providing interpretable rationales for RNN predictions. Nevertheless, interpreting RNNs is a challenging problem. Firstly, unlike existing methods that rely on local approximation, we aim to provide rationales that are more faithful to the decision making process of RNN models. Secondly, a flexible interpretation method should be able to assign contribution scores to text segments of varying lengths, instead of only to individual words. To tackle these challenges, we propose a novel attribution method, called REAT, to provide interpretations to RNN predictions. REAT decomposes the final prediction of a RNN into additive contribution of each word in the input text. This additive decomposition enables REAT to further obtain phrase-level attribution scores. In addition, REAT is generally applicable to various RNN architectures, including GRU, LSTM and their bidirectional versions. Experimental results demonstrate the faithfulness and interpretability of the proposed attribution method. Comprehensive analysis shows that our attribution method could unveil the useful linguistic knowledge captured by RNNs. Some analysis further demonstrates our method could be utilized as a debugging tool to examine the vulnerability and failure reasons of RNNs, which may lead to several promising future directions to promote generalization ability of RNNs.

Understanding Hidden Memories of Recurrent Neural Networks

Residual Recurrent Neural Networks for Learning Sequential Representations.

Visualizing and Understanding Neural Models in NLP

Towards Interpreting Recurrent Neural Networks Through Probabilistic Abstraction

Understanding Recurrent Neural State Using Memory Signatures

Memory Visualization for Gated Recurrent Neural Networks in Speech Recognition

Increasing the Interpretability of Recurrent Neural Networks Using Hidden Markov Models

Assessing the Memory Ability of Recurrent Neural Networks

DeepSeer: Interactive RNN Explanation and Debugging via State Abstraction

Recurrent Memory Networks for Language Modeling

A Novel Framework for Recurrent Neural Networks with Enhancing Information Processing and Transmission between Units

On Attribution of Recurrent Neural Network Predictions via Additive Decomposition

On the Relationship Between RNN Hidden State Vectors and Semantic Ground Truth

Evaluating Recurrent Neural Network Explanations

NeuroView-RNN: It's About Time

Optimization of Recurrent Neural Networks on Natural Language Processing.

Memory and Information Processing in Recurrent Neural Networks

How recurrent networks implement contextual processing in sentiment analysis

Non-local Recurrent Neural Memory for Supervised Sequence Modeling

Ordered Memory.

ReMemNN: A novel memory neural network for powerful interaction in aspect-based sentiment analysis