In-Context Learning Dynamics with Random Binary Sequences

Eric J. Bigelow,Ekdeep Singh Lubana,Robert P. Dick,Hidenori Tanaka,Tomer D. Ullman

2024-04-16

Abstract:Large language models (LLMs) trained on huge corpora of text datasets demonstrate intriguing capabilities, achieving state-of-the-art performance on tasks they were not explicitly trained for. The precise nature of LLM capabilities is often mysterious, and different prompts can elicit different capabilities through in-context learning. We propose a framework that enables us to analyze in-context learning dynamics to understand latent concepts underlying LLMs' behavioral patterns. This provides a more nuanced understanding than success-or-failure evaluation benchmarks, but does not require observing internal activations as a mechanistic interpretation of circuits would. Inspired by the cognitive science of human randomness perception, we use random binary sequences as context and study dynamics of in-context learning by manipulating properties of context data, such as sequence length. In the latest GPT-3.5+ models, we find emergent abilities to generate seemingly random numbers and learn basic formal languages, with striking in-context learning dynamics where model outputs transition sharply from seemingly random behaviors to deterministic repetition.

Artificial Intelligence,Computation and Language,Machine Learning

What problem does this paper attempt to address?

The paper aims to explore the performance and intrinsic mechanisms of large language models (LLMs) in specific tasks. Specifically, the paper focuses on how LLMs activate their internal latent concepts or algorithms through different prompts during In-Context Learning (ICL) and investigates how these concepts influence the model's behavior patterns. The authors propose a framework for analyzing the dynamics of in-context learning to understand the underlying concepts behind LLMs' behavior without directly observing the activation states of hidden units or retraining the model. To achieve this goal, they use random binary sequences as the research subject and explore the behavior changes of LLMs under different conditions by altering input contexts (such as sequence length and other attributes). The experiments found that in the latest GPT-3.5+ model, LLMs demonstrated the ability to generate seemingly random numbers and could learn basic formal languages, exhibiting significant in-context learning dynamics, i.e., gradually transitioning from seemingly random behavior to deterministic repetitive patterns. Therefore, the core issue the paper attempts to address is: how to explain the working principles of LLMs' internal complex capabilities by analyzing their behavior dynamics when processing random binary sequences, and to explore whether in-context learning can be viewed as a model selection process.

In-Context Learning Dynamics with Random Binary Sequences

In-Context Language Learning: Architectures and Algorithms

An Explanation of In-context Learning as Implicit Bayesian Inference

A Theory of Emergent In-Context Learning as Implicit Structure Induction

What Do Language Models Learn in Context? The Structured Task Hypothesis

LLMs Are In-Context Reinforcement Learners

In-context Learning Generalizes, But Not Always Robustly: The Case of Syntax

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

The broader spectrum of in-context learning

Probing the Decision Boundaries of In-context Learning in Large Language Models

Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models

Spin glass model of in-context learning

A Data Generation Perspective to the Mechanism of In-Context Learning

MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent As Meta-Optimizers

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

Meta-in-context learning in large language models

Bayesian scaling laws for in-context learning

Can Large Language Models Understand Context?

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning