Abstract:Large-scale neural language models exhibit a remarkable capacity for in-context learning (ICL): they can infer novel functions from datasets provided as input. Most of our current understanding of when and how ICL arises comes from LMs trained on extremely simple learning problems like linear regression and associative recall. There remains a significant gap between these model problems and the "real" ICL exhibited by LMs trained on large text corpora, which involves not just retrieval and function approximation but free-form generation of language and other structured outputs. In this paper, we study ICL through the lens of a new family of model problems we term in context language learning (ICLL). In ICLL, LMs are presented with a set of strings from a formal language, and must generate additional strings from the same language. We focus on in-context learning of regular languages generated by random finite automata. We evaluate a diverse set of neural sequence models (including several RNNs, Transformers, and state-space model variants) on regular ICLL tasks, aiming to answer three questions: (1) Which model classes are empirically capable of ICLL? (2) What algorithmic solutions do successful models implement to perform ICLL? (3) What architectural changes can improve ICLL in less performant models? We first show that Transformers significantly outperform neural sequence models with recurrent or convolutional representations on ICLL tasks. Next, we provide evidence that their ability to do so relies on specialized "n-gram heads" (higher-order variants of induction heads) that compute input-conditional next-token distributions. Finally, we show that hard-wiring these heads into neural models improves performance not just on ICLL, but natural language modeling -- improving the perplexity of 340M-parameter models by up to 1.14 points (6.7%) on the SlimPajama dataset.

Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

A Survey on In-context Learning

In-Context Learning for Text Classification with Many Labels

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning

Implicit In-context Learning

Disentangling the Prosody and Semantic Information with Pre-trained Model for In-Context Learning based Zero-Shot Voice Conversion

Towards Multimodal In-Context Learning for Vision & Language Models

In-Context Language Learning: Architectures and Algorithms

Inference and Verbalization Functions During In-Context Learning

Multimodal Contrastive In-Context Learning

VL-ICL Bench: The Devil in the Details of Multimodal In-Context Learning

Pre-Training to Learn in Context

LLMs Are In-Context Reinforcement Learners

In-Context Learning Learns Label Relationships but Is Not Conventional Learning

Can MLLMs Perform Text-to-Image In-Context Learning?

Do pretrained Transformers Learn In-Context by Gradient Descent?

Towards Understanding In-Context Learning with Contrastive Demonstrations and Saliency Maps

Knowledgeable In-Context Tuning: Exploring and Exploiting Factual Knowledge for In-Context Learning

Let's Learn Step by Step: Enhancing In-Context Learning Ability with Curriculum Learning

What Do Language Models Learn in Context? The Structured Task Hypothesis