Local Interpretations for Explainable Natural Language Processing: A Survey

Siwen Luo,Hamish Ivison,Soyeon Caren Han,Josiah Poon
DOI: https://doi.org/10.1145/3649450
IF: 16.6
2024-03-15
ACM Computing Surveys
Abstract:As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are specifically divided into three categories: 1) interpreting the model’s predictions through related input features; 2) interpreting through natural language explanation; 3) probing the hidden states of models and word representations.
computer science, theory & methods
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper primarily explores how to improve the interpretability of deep neural networks (DNN) in natural language processing (NLP) tasks. Specifically: 1. **Background and Motivation**: - Over the past 10 years, the application of deep learning technology in various fields has significantly increased, but it has also raised concerns about the opacity of "black box" models. - The paper points out that although DNNs perform excellently in various tasks, their inability to provide explanations for their predictions has become a serious issue, limiting their application in critical fields such as healthcare and justice. 2. **Research Objectives**: - The research aims to investigate and summarize different methods to improve the interpretability of deep neural networks in NLP tasks. - Special focus is given to local explanation methods, which provide explanations for specific decisions or instances. 3. **Main Content**: - Defines the concept of "interpretability" and discusses its various aspects, such as fidelity, stability, comprehensibility, and trustworthiness. - Proposes three main local explanation methods: - **Feature Importance Methods**: Identify the most important elements in the input instance. - **Natural Language Explanations**: Generate textual explanations to illustrate the prediction results. - **Probing Methods**: Examine the internal state of the model given an input. 4. **Application Scenarios**: - Suitable for ordinary users without expertise in machine learning or deep learning, helping them understand and verify the correctness of model predictions. In summary, the main goal of the paper is to improve the interpretability of deep neural networks in NLP tasks through local explanation methods, making them better applicable to real-world scenarios.