Racing Thoughts: Explaining Large Language Model Contextualization Errors

Michael A. Lepori,Michael Mozer,Asma Ghandeharioun

2024-10-03

Abstract:The profound success of transformer-based language models can largely be attributed to their ability to integrate relevant contextual information from an input sequence in order to generate a response or complete a task. However, we know very little about the algorithms that a model employs to implement this capability, nor do we understand their failure modes. For example, given the prompt "John is going fishing, so he walks over to the bank. Can he make an ATM transaction?", a model may incorrectly respond "Yes" if it has not properly contextualized "bank" as a geographical feature, rather than a financial institution. We propose the LLM Race Conditions Hypothesis as an explanation of contextualization errors of this form. This hypothesis identifies dependencies between tokens (e.g., "bank" must be properly contextualized before the final token, "?", integrates information from "bank"), and claims that contextualization errors are a result of violating these dependencies. Using a variety of techniques from mechanistic intepretability, we provide correlational and causal evidence in support of the hypothesis, and suggest inference-time interventions to address it.

Computation and Language

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the errors in context understanding of large language models (LLMs). Specifically, when processing input sequences, these models sometimes fail to correctly interpret polysemous words or phrases according to context information, resulting in inaccurate or inconsistent responses. For example, when the model encounters the sentence "John is going fishing, so he walks over to the bank. Can he make an ATM transaction?", if the model does not correctly understand "bank" as a geographical feature rather than a financial institution, it may wrongly answer "Yes". The paper proposes a hypothesis to explain such context - understanding errors, called the "LLM Race Conditions Hypothesis". This hypothesis holds that context - understanding errors occur because "race conditions" arise when the model processes information between different layers, that is, some tokens (such as the question mark at the end of a sentence) integrate information before another crucial token (such as "bank") has been properly contextualized. Such an error in sequence leads the model to generate incorrect answers. To verify this hypothesis, researchers used state - of - the - art mechanistic interpretability techniques, provided correlational and causal evidence in support of the hypothesis, and proposed inference - time interventions to solve these problems. In addition, the paper also explored the prevalence of these race conditions in feed - forward language models and pointed out potential solutions.

Racing Thoughts: Explaining Large Language Model Contextualization Errors

Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell

Why Larger Language Models Do In-context Learning Differently?

Context Matter: Data-Efficient Augmentation of Large Language Models for Scientific Applications

Language Models Don't Always Say What They Think: Unfaithful Explanations in Chain-of-Thought Prompting

Enhancing Large Language Models' Situated Faithfulness to External Contexts

Can Large Language Models Understand Context?

Explainability for Large Language Models: A Survey

Improving Large Language Model (LLM) fidelity through context-aware grounding: A systematic approach to reliability and veracity

Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve

In-context Interference in Chat-based Large Language Models

Contextualized Evaluations: Taking the Guesswork Out of Language Model Evaluations

Mechanistic interpretability of large language models with applications to the financial services industry

Embers of autoregression show how large language models are shaped by the problem they are trained to solve

Explaining How Transformers Use Context to Build Predictions

Large Language Models Cannot Explain Themselves

Post Hoc Explanations of Language Models Can Improve Language Models

Towards a Theoretical Understanding of the 'Reversal Curse' via Training Dynamics

End-to-End Speech Recognition Contextualization with Large Language Models

From Words to Numbers: Your Large Language Model Is Secretly A Capable Regressor When Given In-Context Examples