Abstract:We introduce a category-theoretic diagrammatic formalism in order to systematically relate and reason about machine learning models. Our diagrams present architectures intuitively but without loss of essential detail, where natural relationships between models are captured by graphical transformations, and important differences and similarities can be identified at a glance. In this paper, we focus on attention mechanisms: translating folklore into mathematical derivations, and constructing a taxonomy of attention variants in the literature. As a first example of an empirical investigation underpinned by our formalism, we identify recurring anatomical components of attention, which we exhaustively recombine to explore a space of variations on the attention mechanism.

What problem does this paper attempt to address?

The paper primarily explores how to systematically represent and understand machine learning models, particularly attention mechanisms, by introducing a new diagrammatic form—string diagrams—to better comprehend and compare different deep learning architectures. The paper addresses two core issues: 1. **The trade-off between formal details and abstract perspectives**: When describing deep learning architectures, it is essential to maintain enough formality to ensure precision while also being able to conceptually understand the differences between models intuitively. 2. **The extension of formal expressive power**: Existing graphical representation methods often lack an inherent way to compare structural differences between different models. To solve these problems, the authors propose string diagrams as a new graphical representation form and define a set of rewriting rules based on this. String diagrams combine formality and intuitiveness, allowing researchers to freely switch between different levels of abstraction. At the same time, the rewriting rules enable formal exploration of the relationships between models. Specifically regarding attention mechanisms, the paper first classifies attention mechanisms and constructs a taxonomy of attention variants. Then, the authors perform a formal analysis of attention mechanisms based on string diagrams and rewriting rules, and through empirical studies, they explore the impact of attention mechanism structures on performance. The experimental section lists and tests a series of common attention components, combining them into various attention mechanisms, and evaluates their performance on word-level language modeling tasks. Ultimately, the paper finds that different attention mechanism structures seem to have little impact on their performance in representative tasks, suggesting that the specific structure of the attention mechanism may not be the key factor determining model performance. This conclusion challenges the current understanding of the internal workings of Transformer models and implies that other types of models or larger-scale attention mechanisms may have better performance.

On the Anatomy of Attention

Neural Attention Models in Deep Learning: Survey and Taxonomy

Attention in Psychology, Neuroscience, and Machine Learning

An Attentive Survey of Attention Models

On the Regularity of Attention

Attention in Natural Language Processing

Understanding More about Human and Machine Attention in Deep Neural Networks

Attention: Marginal Probability is All You Need?

A General Survey on Attention Mechanisms in Deep Learning

Representational structures as a unifying framework for attention

Attention Schema in Neural Agents

Modelling aspects of consciousness: a topological perspective

Learning to attend in a brain-inspired deep neural network

Attention Flows: Analyzing and Comparing Attention Mechanisms in Language Models

A Mathematical Theory of Attention

Modeling Structure and Dynamics of Selective Attention

Understanding Attention: In Minds and Machines

Human Vs Machine Attention in Neural Networks: A Comparative Study.

On the Interpretability of Attention Networks

Naturalness of Attention: Revisiting Attention in Code Language Models

Rethinking the role of attention mechanism: a causality perspective