On the Interpretation of Convolutional Neural Networks for Text Classification

Jincheng Xu,Qingfeng Du
DOI: https://doi.org/10.3233/faia200352
2020-01-01
Abstract:A long-standing obstacle accompanying the growing popularity of convolutional neural networks (CNNs) is the lack of interpretability, which is essential to explain the decision-making process and diagnose the model’s behavior. In this paper, we present a mathematical decomposition to translate the output of CNN for text classification into an ngram-level score matrix and a word-level score matrix, revealing how various parts of input sentences contribute to the final prediction quantitatively. By exploiting the derived ngramlevel score matrix, we perform extensive experiments to demonstrate how n-gram features are learned via the convolutional layer. We refine previous intuitions about the behavior of filters and perform a deep investigation into their underlying properties. By leveraging the resulting word-level score matrix, we propose two visualization methods in either a global view or a local view to show how the model highlights the relative importance of inputs to arrive at a particular result. Moreover, we show how to perform adversarial attacks with word-level importance scores, and we achieve higher success rate than the baseline. Consequently, by interpreting the model in the form of score matrices, we are able to zoom in on the black boxes of CNN-based text classification models and present a comprehensive analysis of their behaviors.
What problem does this paper attempt to address?