Local Interpretations for Explainable Natural Language Processing: A Survey

Siwen Luo,Hamish Ivison,Soyeon Caren Han,Josiah Poon

DOI: https://doi.org/10.1145/3649450

IF: 16.6

2024-03-15

ACM Computing Surveys

Abstract:As the use of deep learning techniques has grown across various fields over the past decade, complaints about the opaqueness of the black-box models have increased, resulting in an increased focus on transparency in deep learning models. This work investigates various methods to improve the interpretability of deep neural networks for Natural Language Processing (NLP) tasks, including machine translation and sentiment analysis. We provide a comprehensive discussion on the definition of the term interpretability and its various aspects at the beginning of this work. The methods collected and summarised in this survey are only associated with local interpretation and are specifically divided into three categories: 1) interpreting the model’s predictions through related input features; 2) interpreting through natural language explanation; 3) probing the hidden states of models and word representations.

computer science, theory & methods

What problem does this paper attempt to address?

### Problems the Paper Attempts to Solve This paper primarily explores how to improve the interpretability of deep neural networks (DNN) in natural language processing (NLP) tasks. Specifically: 1. **Background and Motivation**: - Over the past 10 years, the application of deep learning technology in various fields has significantly increased, but it has also raised concerns about the opacity of "black box" models. - The paper points out that although DNNs perform excellently in various tasks, their inability to provide explanations for their predictions has become a serious issue, limiting their application in critical fields such as healthcare and justice. 2. **Research Objectives**: - The research aims to investigate and summarize different methods to improve the interpretability of deep neural networks in NLP tasks. - Special focus is given to local explanation methods, which provide explanations for specific decisions or instances. 3. **Main Content**: - Defines the concept of "interpretability" and discusses its various aspects, such as fidelity, stability, comprehensibility, and trustworthiness. - Proposes three main local explanation methods: - **Feature Importance Methods**: Identify the most important elements in the input instance. - **Natural Language Explanations**: Generate textual explanations to illustrate the prediction results. - **Probing Methods**: Examine the internal state of the model given an input. 4. **Application Scenarios**: - Suitable for ordinary users without expertise in machine learning or deep learning, helping them understand and verify the correctness of model predictions. In summary, the main goal of the paper is to improve the interpretability of deep neural networks in NLP tasks through local explanation methods, making them better applicable to real-world scenarios.

Local Interpretations for Explainable Natural Language Processing: A Survey

Local Interpretations for Explainable Natural Language Processing: A Survey

From Understanding to Utilization: A Survey on Explainability for Large Language Models

Post-hoc Interpretability for Neural NLP: A Survey

Interpreting Deep Learning Models in Natural Language Processing: A Review

Explainability of Text Processing and Retrieval Methods: A Critical Survey

A Survey of the Interpretability Aspect of Deep Learning Models

Interpretable deep learning: interpretation, interpretability, trustworthiness, and beyond

A Survey on Neural Network Interpretability

Interpretable Deep Learning Models: Enhancing Transparency and Trustworthiness in Explainable AI

Explainability for Large Language Models: A Survey

Explainable Artificial Intelligence: Understanding, Visualizing and Interpreting Deep Learning Models

Interpretability of deep learning models: A survey of results

Multi-resolution Interpretation and Diagnostics Tool for Natural Language Classifiers

Explaining Explanations: An Overview of Interpretability of Machine Learning

Visual Interpretability for Deep Learning: a Survey

Towards Interpretable Natural Language Understanding with Explanations As Latent Variables

Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks

Interpretability in Graph Neural Networks

Trusting deep learning natural-language models via local and global explanations