Locating Information in Large Language Models via Random Matrix Theory

Max Staats,Matthias Thamm,Bernd Rosenow
2024-10-23
Abstract:As large language models (LLMs) become central to AI applications, gaining a deeper understanding of their inner workings is increasingly important. In this work, we analyze the weight matrices of pretrained transformer models -- specifically BERT and Llama -- using random matrix theory (RMT) as a zero-information hypothesis. While randomly initialized weights perfectly agree with RMT predictions, deviations emerge after training, allowing us to locate learned structures within the models. We identify layer-type specific behaviors that are consistent across all blocks and architectures considered. By pinpointing regions that deviate from RMT predictions, we highlight areas of feature learning and confirm this through comparisons with the activation covariance matrices of the corresponding layers. Our method provides a diagnostic tool for identifying relevant regions in transformer weights using only the trained matrices. Additionally, we address the ongoing debate regarding the significance of small singular values in the context of fine-tuning and alignment in LLMs. Our findings reveal that, after fine-tuning, small singular values play a crucial role in the models' capabilities, suggesting that removing them in an already aligned transformer can be detrimental, as it may compromise model alignment.
Machine Learning,Disordered Systems and Neural Networks
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to gain an in - depth understanding of the internal working mechanisms of large - language models (LLMs) through Random Matrix Theory (RMT). Specifically, the author analyzes the weight matrices in pre - trained Transformer models (such as BERT and Llama) and attempts to answer the following key questions: 1. **Location of information storage**: Through RMT analysis, the author attempts to determine which areas in LLMs store the learned information. They find that after training, the weight matrices deviate from RMT predictions, and these deviations can help locate the learned features in the model. 2. **Layer - type - specific behavior**: The author studies the behavioral differences of different types of weight matrices (such as query, key, value, attention.output, etc.) before and after training. They find that some types of matrices deviate significantly from RMT predictions after training, while other types of matrices remain close to the initial state. This indicates that different types of matrices play different roles in the feature - learning process. 3. **Importance of small singular values**: The author explores the role of small singular values in the fine - tuning and alignment processes. Some studies suggest that small singular values are crucial for the generalization ability of the model, while other studies believe that removing them may be beneficial. The author experimentally verifies that small singular values do play an important role in the fine - tuned model, and removing them may damage the model's performance. 4. **Difference between feature learning and lazy learning**: The author distinguishes between feature learning and lazy learning by comparing the spectral distributions of different matrices. Feature learning means that the weights change significantly during the training process, while lazy learning means that the weights remain close to the initial random state. They find that the query matrix is more prominent in feature learning, while the attention.output matrix is closer to lazy learning. 5. **Effectiveness of verification methods**: The author verifies the effectiveness of their method by comparing the singular vectors of the weight matrices with the eigenvectors of the activation covariance matrix. The results show that the areas that deviate from RMT predictions do indeed correspond to the learned features. ### Summary In general, this paper provides a new method for diagnosing and understanding the information storage and feature - learning processes in large - language models by applying Random Matrix Theory. The author's research not only reveals the behavioral differences of different layer types in the learning process but also provides new insights into the role of small singular values in model performance.