Explanation sensitivity to the randomness of large language models: the case of journalistic text classification

Jeremie Bogaert,Marie-Catherine de Marneffe,Antonin Descampe,Louis Escouflaire,Cedrick Fairon,Francois-Xavier Standaert

2024-10-07

Abstract:Large language models (LLMs) perform very well in several natural language processing tasks but raise explainability challenges. In this paper, we examine the effect of random elements in the training of LLMs on the explainability of their predictions. We do so on a task of opinionated journalistic text classification in French. Using a fine-tuned CamemBERT model and an explanation method based on relevance propagation, we find that training with different random seeds produces models with similar accuracy but variable explanations. We therefore claim that characterizing the explanations' statistical distribution is needed for the explainability of LLMs. We then explore a simpler model based on textual features which offers stable explanations but is less accurate. Hence, this simpler model corresponds to a different tradeoff between accuracy and explainability. We show that it can be improved by inserting features derived from CamemBERT's explanations. We finally discuss new research directions suggested by our results, in particular regarding the origin of the sensitivity observed in the training randomness.

Computation and Language

What problem does this paper attempt to address?

The paper attempts to address the issue of the impact of random factors on the interpretability of large language models (LLMs) during the training process. Specifically, the authors focus on whether these random factors significantly affect the interpretability of model predictions, especially in the context of opinionated news text classification. The paper investigates models trained with different random seeds, finding that although these models have similar accuracy, the explanations they provide differ. Therefore, the authors propose the need to characterize the statistical distribution of these explanations to ensure the interpretability of the models. The main research questions of the paper are: 1. **How do random factors during the training process affect the interpretability of large language models**: Through experiments, it is verified that models trained under different random seeds, despite having similar prediction accuracy, exhibit variability in their explanation results. 2. **How to characterize this variability in explanations**: The paper proposes using statistical methods to characterize the distribution of different model explanations to assess the consistency and reliability of model interpretability. 3. **Exploring the trade-off between interpretability and accuracy in simplified models**: By introducing simplified models based on text features, the paper explores the possibility of improving model accuracy while maintaining the stability of explanations. Through this research, the paper aims to provide new perspectives on understanding the interpretability of large language models and to suggest directions for future research.

Explanation sensitivity to the randomness of large language models: the case of journalistic text classification

A Question on the Explainability of Large Language Models and the Word-Level Univariate First-Order Plausibility Assumption

Explaining machine learning models using entropic variable projection

Investigating the Impact of Model Instability on Explanations and Uncertainty

Towards Uncovering How Large Language Model Works: An Explainability Perspective

Evaluating the Reliability of Self-Explanations in Large Language Models

A Methodology for Explainable Large Language Models with Integrated Gradients and Linguistic Analysis in Text Classification

Quantifying Uncertainty in Natural Language Explanations of Large Language Models

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

The Effect of Model Size on LLM Post-hoc Explainability via LIME

Explainability for Large Language Models: A Survey

Model Explainability in Deep Learning Based Natural Language Processing

Uncertainty-Aware Explainable Recommendation with Large Language Models

Trusting deep learning natural-language models via local and global explanations

Logistic Regression makes small LLMs strong and explainable "tens-of-shot" classifiers

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

"Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations

Large Language Models Cannot Explain Themselves

Embers of autoregression show how large language models are shaped by the problem they are trained to solve

Local Explanations and Self-Explanations for Assessing Faithfulness in black-box LLMs