Abstract:Humans distill complex experiences into fundamental abstractions that enable rapid learning and adaptation. Similarly, autoregressive transformers exhibit adaptive learning through in-context learning (ICL), which begs the question of how. In this paper, we propose concept encoding-decoding mechanism to explain ICL by studying how transformers form and use internal abstractions in their representations. On synthetic ICL tasks, we analyze the training dynamics of a small transformer and report the coupled emergence of concept encoding and decoding. As the model learns to encode different latent concepts (e.g., ``Finding the first noun in a sentence.") into distinct, separable representations, it concureently builds conditional decoding algorithms and improve its ICL performance. We validate the existence of this mechanism across pretrained models of varying scales (Gemma-2 2B/9B/27B, Llama-3.1 8B/70B). Further, through mechanistic interventions and controlled finetuning, we demonstrate that the quality of concept encoding is causally related and predictive of ICL performance. Our empirical insights shed light into better understanding the success and failure modes of large language models via their representations.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to understand how autoregressive transformers form and use internal abstract representations through in - context learning (ICL). Specifically, the authors propose a concept encoding - decoding mechanism to explain how transformers form and use internal abstractions during the ICL process.
### Main problems
1. **How to explain the behavior of transformers in ICL?**
- By studying how transformers form and use internal abstract representations, the authors propose a concept encoding - decoding mechanism to explain this process.
2. **Why do task vectors appear in pre - trained language models?**
- The authors show through synthetic experiments how concept encoding and decoding occur simultaneously, and these task vectors are naturally generated through this mechanism.
3. **Is there a causal relationship between the quality of concept encoding and ICL performance?**
- Through causal analysis and controlled fine - tuning experiments, the authors verify that the quality of concept encoding can predict and influence the performance of ICL.
### Specific problems and solutions
#### 1. How does the concept encoding - decoding behavior emerge during the model training process?
The authors solve the synthetic ICL task by training a small transformer model. They observe that as the training progresses, the model gradually learns to encode different latent concepts into separate representation spaces and simultaneously develops a conditional decoding algorithm. This two - stage process occurs in a coupled manner, indicating a mutually dependent relationship between them.
#### 2. What is the relationship between the model's ability to accurately infer latent concepts and downstream ICL performance?
The authors introduce Concept Decodability (CD) as a geometric indicator for measuring internal abstraction formation and show that CD can effectively predict the ICL performance of pre - trained language models. Through verification on different tasks, model families, and scales, the universality of this relationship is proven.
#### 3. Does the concept encoding - decoding mechanism still hold in more complex tasks?
To verify the robustness of this mechanism in more complex tasks, the authors conduct an ablation study, increasing the number of basis functions and introducing non - orthogonal bases. The results show that although some overlapping concepts may not be completely separated in complex tasks, overall, the concept encoding - decoding mechanism is still effective.
### Experimental verification
The authors conduct empirical verification on pre - trained language models (such as Gemma - 2 and Llama - 3.1) and test several hypotheses:
- **Hypothesis 1: There is concept encoding - decoding behavior in pre - trained language models.**
- By visualizing the representations of the intermediate layers using UMAP, it is found that as the number of context examples increases, the model can construct increasingly separated representation spaces, especially most obviously in the intermediate layers.
- **Hypothesis 2: Concept encoding triggers different decoding algorithms, and there is a causal relationship between the two.**
- Through causal intervention studies, it is verified that helping or hindering the model from inferring latent concepts will affect its performance in downstream tasks, further proving the causal connection between concept encoding and decoding.
### Conclusion
Through the above research, the authors prove that the concept encoding - decoding mechanism is a key mechanism for transformers to form internal abstractions during the ICL process. This mechanism not only explains the emergence of task vectors but also reveals the causal relationship between the quality of concept encoding and ICL performance, providing a new perspective for understanding and optimizing large - language models.
### Formula presentation
- **ICL formula in the Bayesian framework**:
\[
p(y^*|x^*, D)=\int P_\theta(y^*|x^*, z)P_\theta(z|D)dz
\]
where \(z\) is the latent concept, \(D\) is the set of context examples, \(P_\theta(z|D)\) is the distribution of inferring latent concepts from context examples, and \(P_\theta(y^*|x^*, z)\) is the probability of generating output based on the inferred latent concepts.
- **Concept Decodability (CD) calculation**:
Using the k - nearest neighbor (k - NN) classification index: