Abstract:In-context Learning (ICL) is an emerging few-shot learning paradigm on Language Models (LMs) with inner mechanisms un-explored. There are already existing works describing the inner processing of ICL, while they struggle to capture all the inference phenomena in large language models. Therefore, this paper proposes a comprehensive circuit to model the inference dynamics and try to explain the observed phenomena of ICL. In detail, we divide ICL inference into 3 major operations: (1) Summarize: LMs encode every input text (demonstrations and queries) into linear representation in the hidden states with sufficient information to solve ICL tasks. (2) Semantics Merge: LMs merge the encoded representations of demonstrations with their corresponding label tokens to produce joint representations of labels and demonstrations. (3) Feature Retrieval and Copy: LMs search the joint representations similar to the query representation on a task subspace, and copy the searched representations into the query. Then, language model heads capture these copied label representations to a certain extent and decode them into predicted labels. The proposed inference circuit successfully captured many phenomena observed during the ICL process, making it a comprehensive and practical explanation of the ICL inference process. Moreover, ablation analysis by disabling the proposed steps seriously damages the ICL performance, suggesting the proposed inference circuit is a dominating mechanism. Additionally, we confirm and list some bypass mechanisms that solve ICL tasks in parallel with the proposed circuit.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is that the In - Context Learning (ICL) mechanism in large - language models is not yet fully understood. Although some works have described the internal processing of ICL, these studies have failed to comprehensively capture all the reasoning phenomena in large - language models. Therefore, this paper proposes a comprehensive reasoning circuit model to explain the observed phenomena of ICL. Specifically: 1. **Problem Background**: - Context learning (ICL) is an emerging few - shot learning paradigm that allows a language model to complete tasks with a small number of examples without additional training. - Although ICL has attracted wide attention, its underlying mechanism remains unclear. - Although existing research has attempted to explain the reasoning process of ICL, it has failed to comprehensively capture all the operational dynamics and interesting phenomena in large - language models. 2. **Research Objectives**: - Propose a unified reasoning circuit model to more comprehensively explain the reasoning process of ICL. - Verify the effectiveness of the proposed reasoning circuit model through experiments and explore some interesting phenomena in the ICL process, such as position bias, noise robustness, and example saturation. 3. **Main Contributions**: - Propose a three - step reasoning circuit model, including: (1) Generalization: The language model encodes each input text into a linear representation in the hidden state; (2) Semantic Merging: The language model merges the encoded representation of the example with its corresponding label token; (3) Feature Retrieval and Copying: The language model retrieves the merged label representation similar to the query representation in the task - related subspace and copies it into the query representation. - Verify the existence of each step of the reasoning process through detailed experiments and successfully capture a large number of interesting phenomena in the ICL process. - Prove through ablation analysis that the proposed reasoning circuit model is the dominant mechanism, but there are also some bypass mechanisms. In summary, this paper aims to comprehensively explain the context - learning mechanism in large - language models by proposing a comprehensive reasoning circuit model and prove its effectiveness and practicality through experiments.

Revisiting In-context Learning Inference Circuit in Large Language Models

Does In-Context Learning Really Learn? Rethinking How Large Language Models Respond and Solve Tasks via In-Context Learning

Inference and Verbalization Functions During In-Context Learning

In-Context Language Learning: Architectures and Algorithms

Decoding In-Context Learning: Neuroscience-inspired Analysis of Representations in Large Language Models

Large Language Models Know What Makes Exemplary Contexts

ICLEval: Evaluating In-Context Learning Ability of Large Language Models

Why Larger Language Models Do In-context Learning Differently?

Many-Shot In-Context Learning

Implicit In-context Learning

A Survey on In-context Learning

Unveiling In-Context Learning: A Coordinate System to Understand Its Working Mechanism

What In-Context Learning "Learns" In-Context: Disentangling Task Recognition and Task Learning

In-Context Learning for Text Classification with Many Labels

Hint-enhanced In-Context Learning wakes Large Language Models up for knowledge-intensive tasks

Explaining Emergent In-Context Learning as Kernel Regression

"In-Context Learning" or: How I learned to stop worrying and love "Applied Information Retrieval"

In-Context Learning Learns Label Relationships but Is Not Conventional Learning

Competition Dynamics Shape Algorithmic Phases of In-Context Learning

Towards Multimodal In-Context Learning for Vision & Language Models

From Unstructured Data to In-Context Learning: Exploring What Tasks Can Be Learned and When