Abstract:As large language models (LLMs) become central to AI applications, gaining a deeper understanding of their inner workings is increasingly important. In this work, we analyze the weight matrices of pretrained transformer models -- specifically BERT and Llama -- using random matrix theory (RMT) as a zero-information hypothesis. While randomly initialized weights perfectly agree with RMT predictions, deviations emerge after training, allowing us to locate learned structures within the models. We identify layer-type specific behaviors that are consistent across all blocks and architectures considered. By pinpointing regions that deviate from RMT predictions, we highlight areas of feature learning and confirm this through comparisons with the activation covariance matrices of the corresponding layers. Our method provides a diagnostic tool for identifying relevant regions in transformer weights using only the trained matrices. Additionally, we address the ongoing debate regarding the significance of small singular values in the context of fine-tuning and alignment in LLMs. Our findings reveal that, after fine-tuning, small singular values play a crucial role in the models' capabilities, suggesting that removing them in an already aligned transformer can be detrimental, as it may compromise model alignment.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to gain an in - depth understanding of the internal working mechanisms of large - language models (LLMs) through Random Matrix Theory (RMT). Specifically, the author analyzes the weight matrices in pre - trained Transformer models (such as BERT and Llama) and attempts to answer the following key questions: 1. **Location of information storage**: Through RMT analysis, the author attempts to determine which areas in LLMs store the learned information. They find that after training, the weight matrices deviate from RMT predictions, and these deviations can help locate the learned features in the model. 2. **Layer - type - specific behavior**: The author studies the behavioral differences of different types of weight matrices (such as query, key, value, attention.output, etc.) before and after training. They find that some types of matrices deviate significantly from RMT predictions after training, while other types of matrices remain close to the initial state. This indicates that different types of matrices play different roles in the feature - learning process. 3. **Importance of small singular values**: The author explores the role of small singular values in the fine - tuning and alignment processes. Some studies suggest that small singular values are crucial for the generalization ability of the model, while other studies believe that removing them may be beneficial. The author experimentally verifies that small singular values do play an important role in the fine - tuned model, and removing them may damage the model's performance. 4. **Difference between feature learning and lazy learning**: The author distinguishes between feature learning and lazy learning by comparing the spectral distributions of different matrices. Feature learning means that the weights change significantly during the training process, while lazy learning means that the weights remain close to the initial random state. They find that the query matrix is more prominent in feature learning, while the attention.output matrix is closer to lazy learning. 5. **Effectiveness of verification methods**: The author verifies the effectiveness of their method by comparing the singular vectors of the weight matrices with the eigenvectors of the activation covariance matrix. The results show that the areas that deviate from RMT predictions do indeed correspond to the learned features. ### Summary In general, this paper provides a new method for diagnosing and understanding the information storage and feature - learning processes in large - language models by applying Random Matrix Theory. The author's research not only reveals the behavioral differences of different layer types in the learning process but also provides new insights into the role of small singular values in model performance.

Locating Information in Large Language Models via Random Matrix Theory

Random matrix analysis of deep neural network weight matrices

TRAWL: Tensor Reduced and Approximated Weights for Large Language Models

Data-freeWeight Compress and Denoise for Large Language Models

Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks

The Mysterious Case of Neuron 1512: Injectable Realignment Architectures Reveal Internal Characteristics of Meta's Llama 2 Model

The Information of Large Language Model Geometry

Why Larger Language Models Do In-context Learning Differently?

LRQ: Optimizing Post-Training Quantization for Large Language Models by Learning Low-Rank Weight-Scaling Matrices

The Truth is in There: Improving Reasoning in Language Models with Layer-Selective Rank Reduction

From Attention to Activation: Unravelling the Enigmas of Large Language Models

Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models

A Law of Next-Token Prediction in Large Language Models

LLM-Rank: A Graph Theoretical Approach to Pruning Large Language Models

Large Language Models for Mathematicians

Unveiling Linguistic Regions in Large Language Models

Investigating Layer Importance in Large Language Models

Understanding Layer Significance in LLM Alignment

Massive Activations in Large Language Models

Dynamic Universal Approximation Theory: The Basic Theory for Transformer-based Large Language Models

Insights into LLM Long-Context Failures: When Transformers Know but Don't Tell