Abstract:Perturbation-based explanation methods such as LIME and SHAP are commonly applied to text classification. This work focuses on their extension to generative language models. To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms. To handle text output, we introduce the notion of scalarizers for mapping text to real numbers and investigate multiple possibilities. To handle long inputs, we take a multi-level approach, proceeding from coarser levels of granularity to finer ones, and focus on algorithms with linear scaling in model queries. We conduct a systematic evaluation, both automated and human, of perturbation-based attribution methods for summarization and context-grounded question answering. The results show that our framework can provide more locally faithful explanations of generated outputs.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to extend perturbation - based input attribution methods to generative language models in order to explain the outputs of these models. Specifically, the paper focuses on two main challenges: 1. **Text Output**: Unlike classification tasks, the output of generative language models is text, rather than real - valued numbers (such as the predicted class log - probabilities). This requires the introduction of scalarizers, that is, functions \( S \), to map the output text back to real numbers for attribution analysis. 2. **Long - text Input**: Generative language tasks (such as summary generation and context - based question answering) usually need to handle longer input texts. This not only increases the demand for computational resources but also brings about the complexity of explanation, because users may want to understand the importance of the input at different granularity levels. To solve these problems, the paper proposes a general framework **MExGen** (Multi - level Explanations for Generative Language Models), which can be instantiated as different attribution algorithms and adopts a multi - level strategy to handle long input texts. The following are the main contributions of the paper: - **Proposing the MExGen Framework**: It extends the perturbation - based input attribution methods to make them applicable to generative language models and uses a multi - level strategy to meet the challenges of long input texts. - **Researching Multiple Scalarizers**: It explores multiple methods of mapping output text to real numbers, especially for models that only provide text output. - **Systematic Evaluation**: Through automatic and manual evaluations, it verifies the performance of MExGen in summary generation and context - based question - answering tasks, showing that it can provide more locally faithful explanations. Through these contributions, the paper aims to improve the interpretability of generative language models, enabling users to better understand and trust the outputs of these models.

Multi-Level Explanations for Generative Language Models

TextGenSHAP: Scalable Post-hoc Explanations in Text Generation with Long Documents

Exploring Iterative Enhancement for Improving Learnersourced Multiple-Choice Question Explanations with Large Language Models

Automatic Generation of Natural Language Explanations

Can Large Language Models Explain Themselves? A Study of LLM-Generated Self-Explanations

Generative Example-Based Explanations: Bridging the Gap between Generative Modeling and Explainability

PromptExp: Multi-granularity Prompt Explanation of Large Language Models

Model Agnostic Multilevel Explanations

Leakage-Adjusted Simulatability: Can Models Generate Non-Trivial Explanations of Their Behavior in Natural Language?

Unsupervised Explanation Generation Via Correct Instantiations

Local Explanation of Dialogue Response Generation

From Feature Importance to Natural Language Explanations Using LLMs with RAG

Harnessing Explanations: LLM-to-LM Interpreter for Enhanced Text-Attributed Graph Representation Learning

Explaining How Transformers Use Context to Build Predictions

Towards Interpretable Natural Language Understanding with Explanations As Latent Variables

MaNtLE: Model-agnostic Natural Language Explainer

Explaining latent representations of generative models with large multimodal models

NLX-GPT: A Model for Natural Language Explanations in Vision and Vision-Language Tasks

generAItor: Tree-in-the-Loop Text Generation for Language Model Explainability and Adaptation

Properties and Challenges of LLM-Generated Explanations