States Hidden in Hidden States: LLMs Emerge Discrete State Representations Implicitly

Junhao Chen,Shengding Hu,Zhiyuan Liu,Maosong Sun
2024-07-16
Abstract:Large Language Models (LLMs) exhibit various emergent abilities. Among these abilities, some might reveal the internal working mechanisms of models. In this paper, we uncover a novel emergent capability in models: the intrinsic ability to perform extended sequences of calculations without relying on chain-of-thought step-by-step solutions. Remarkably, the most advanced models can directly output the results of two-digit number additions with lengths extending up to 15 addends. We hypothesize that the model emerges Implicit Discrete State Representations (IDSRs) within its hidden states and performs symbolic calculations internally. To test this hypothesis, we design a sequence of experiments that look into the hidden states. Specifically, we first confirm that IDSRs exist. Then, we provide interesting observations about the formation of IDSRs from layer, digit, and sequence perspectives. Finally, we confirm that models indeed use IDSRs to produce the final answers. However, we also discover that these state representations are far from lossless in current open-sourced models, leading to inaccuracies in their final performance. Our work presents a novel exploration of LLMs' symbolic calculation abilities and the underlying mechanisms.
Computation and Language
What problem does this paper attempt to address?
### The Problem the Paper Attempts to Solve This paper aims to explore a novel implicit capability of large language models (LLMs) in performing continuous addition tasks: the ability to directly output the calculation results without relying on step-by-step reasoning. Specifically, the authors found that state-of-the-art models can directly provide the results of adding up to 15 two-digit numbers without generating tokens for intermediate steps. To verify this hypothesis, the authors propose a central hypothesis: LLMs internally form Implicit Discrete State Representations (IDSRs) and perform symbolic calculations within their hidden states. By designing a series of experiments to explore whether IDSRs exist in hidden states, their nature, and their formation mechanism, the authors hope to reveal the internal workings of LLMs when performing such tasks. ### Main Research Questions 1. **Do IDSRs truly exist?** 2. **What are the properties of IDSRs?** 3. **How are IDSRs formed?** 4. **How do models utilize IDSRs?** ### Experimental Methods 1. **Dataset Construction**: The authors constructed a dataset containing continuous addition and subtraction problems of different lengths, number of digits, and prompt types. 2. **Hidden State Extraction**: During inference, hidden states corresponding to specific tokens (such as +, -, =) were extracted from different layers of the model. 3. **Classification Probes**: Multi-layer perceptrons (MLPs) were used for classification prediction to verify the existence and properties of IDSRs. 4. **Evaluation Metrics**: Exact Accuracy (EA) was primarily used to evaluate the model's ability to perform continuous addition tasks, and further calculated Individual Digit Accuracy (IDA) and Overall Exact Accuracy (OEA). ### Experimental Results 1. **Integer Prediction**: By training probes to predict the entire number, results showed that prediction accuracy was significantly higher than random guessing in all cases, proving the existence of IDSRs. 2. **Digit-wise Prediction**: By using multiple probes to predict each digit separately, results showed that prediction accuracy remained high in the early layers and after the second plus sign, further validating the existence of IDSRs. ### Conclusion The authors experimentally verified that LLMs indeed form Implicit Discrete State Representations (IDSRs) when performing continuous addition tasks, and these representations gradually form and propagate through different layers of the model. However, there is still some data loss and resolution loss in the IDSRs of current open-source models, which may be one of the reasons for the inaccuracy of the final performance. Future research will further explore how to reduce this error to enhance the capabilities of LLMs.