Multi-Level Explanations for Generative Language Models

Lucas Monteiro Paes,Dennis Wei,Hyo Jin Do,Hendrik Strobelt,Ronny Luss,Amit Dhurandhar,Manish Nagireddy,Karthikeyan Natesan Ramamurthy,Prasanna Sattigeri,Werner Geyer,Soumya Ghosh
2024-03-21
Abstract:Perturbation-based explanation methods such as LIME and SHAP are commonly applied to text classification. This work focuses on their extension to generative language models. To address the challenges of text as output and long text inputs, we propose a general framework called MExGen that can be instantiated with different attribution algorithms. To handle text output, we introduce the notion of scalarizers for mapping text to real numbers and investigate multiple possibilities. To handle long inputs, we take a multi-level approach, proceeding from coarser levels of granularity to finer ones, and focus on algorithms with linear scaling in model queries. We conduct a systematic evaluation, both automated and human, of perturbation-based attribution methods for summarization and context-grounded question answering. The results show that our framework can provide more locally faithful explanations of generated outputs.
Computation and Language,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to extend perturbation - based input attribution methods to generative language models in order to explain the outputs of these models. Specifically, the paper focuses on two main challenges: 1. **Text Output**: Unlike classification tasks, the output of generative language models is text, rather than real - valued numbers (such as the predicted class log - probabilities). This requires the introduction of scalarizers, that is, functions \( S \), to map the output text back to real numbers for attribution analysis. 2. **Long - text Input**: Generative language tasks (such as summary generation and context - based question answering) usually need to handle longer input texts. This not only increases the demand for computational resources but also brings about the complexity of explanation, because users may want to understand the importance of the input at different granularity levels. To solve these problems, the paper proposes a general framework **MExGen** (Multi - level Explanations for Generative Language Models), which can be instantiated as different attribution algorithms and adopts a multi - level strategy to handle long input texts. The following are the main contributions of the paper: - **Proposing the MExGen Framework**: It extends the perturbation - based input attribution methods to make them applicable to generative language models and uses a multi - level strategy to meet the challenges of long input texts. - **Researching Multiple Scalarizers**: It explores multiple methods of mapping output text to real numbers, especially for models that only provide text output. - **Systematic Evaluation**: Through automatic and manual evaluations, it verifies the performance of MExGen in summary generation and context - based question - answering tasks, showing that it can provide more locally faithful explanations. Through these contributions, the paper aims to improve the interpretability of generative language models, enabling users to better understand and trust the outputs of these models.