Abstract:Projecting intermediate representations onto the vocabulary is an increasingly popular interpretation tool for transformer-based LLMs, also known as the logit lens. We propose a quantitative extension to this approach and define spectral filters on intermediate representations based on partitioning the singular vectors of the vocabulary embedding and unembedding matrices into bands. We find that the signals exchanged in the tail end of the spectrum are responsible for attention sinking (Xiao et al. 2023), of which we provide an explanation. We find that the loss of pretrained models can be kept low despite suppressing sizable parts of the embedding spectrum in a layer-dependent way, as long as attention sinking is preserved. Finally, we discover that the representation of tokens that draw attention from many tokens have large projections on the tail end of the spectrum.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to explore and explain a phenomenon in large language models (LLMs), namely **dark signals** and their relationship with **attention sinks**. Specifically, the author attempts to answer the following key questions: 1. **The role of dark signals**: - Dark signals refer to the signals in the linear subspace at the tail in the singular value decomposition of the embedding matrix and the de - embedding matrix of the model. The author hypothesizes that these signals may be used to maintain global features while minimizing interference with the prediction of the next token. However, the study found that the main role of dark signals is as a collector for heads that require "attention sinks". 2. **The working mechanism of attention sinks**: - Attention sinks are a special phenomenon in which the special beginning - of - sentence (BoS) token receives a disproportionately large amount of attention. This is because the model needs to allocate excess attention to this token in some cases to avoid affecting the probability distribution of other tokens. The author explains this process in detail through experiments and shows how to achieve this using dark signals. 3. **Suppressing the influence of dark signals**: - The study also explores the effect of suppressing part of the embedding spectrum while maintaining attention sinks. The results show that even when a considerable part of the embedding spectrum is suppressed, as long as the dark signals are retained, the model can still maintain a low loss. 4. **The relationship between dark signals and generation quality**: - The author further studies the effect of suppressing dark signals on the quality of generated text. The results show that when dark signals are suppressed, the generated text may fall into a repetitive pattern, which is related to the over - copying of the representation of previous tokens by the attention heads. 5. **The function of Attention Bars**: - The paper also explores the "Attention Bars" that often appear in the attention matrix, that is, some tokens also receive a large amount of attention. The author hypothesizes that these tokens may be additional attention sinks, but the current study has not yet reached a definite conclusion. ### Main contributions - **Introducing spectral filters**: This is a new tool for analyzing the content of the model residual flow and the parameter matrix interacting with it. - **Explaining the attention sink mechanism**: Describing in detail how dark signals help to achieve attention sinks and proving their importance in model performance. - **Proposing the possibility of spectral compression**: Based on the results of spectral filters, proposing the possibility of optimizing the model in the future by compressing the low - frequency components not used for attention sinks. Through these studies, the author hopes to provide a new perspective for understanding the internal working principles of large - language models and provide a basis for further improving the safety and controllability of these models.

Spectral Filters, Dark Signals, and Attention Sinks

Spectral Imaging with Deep Learning.

When Attention Sink Emerges in Language Models: An Empirical View

Spectral Probing

Modulate Your Spectrum in Self-Supervised Learning

Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs

TidalDecode: Fast and Accurate LLM Decoding with Position Persistent Sparse Attention

Spectal Harmonics: Bridging Spectral Embedding and Matrix Completion in Self-Supervised Learning

An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models

ResiDual Transformer Alignment with Spectral Decomposition

Balanced Data, Imbalanced Spectra: Unveiling Class Disparities with Spectral Imbalance

Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs

Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations

Latent Functional Maps: a spectral framework for representation alignment

A Spectral Theory of Neural Prediction and Alignment

Spectral Transform Forms Scalable Transformer

Balancing Embedding Spectrum for Recommendation

Spectral regression: a unified subspace learning framework for content-based image retrieval.

Spectral Representations for Convolutional Neural Networks

Spectral Learning of Latent Semantics for Action Recognition

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models