Abstract:The attention module is the key component in Transformers. While the global attention mechanism offers high expressiveness, its excessive computational cost restricts its applicability in various scenarios. In this paper, we propose a novel attention paradigm, Agent Attention, to strike a favorable balance between computational efficiency and representation power. Specifically, the Agent Attention, denoted as a quadruple $(Q, A, K, V)$, introduces an additional set of agent tokens $A$ into the conventional attention module. The agent tokens first act as the agent for the query tokens $Q$ to aggregate information from $K$ and $V$, and then broadcast the information back to $Q$. Given the number of agent tokens can be designed to be much smaller than the number of query tokens, the agent attention is significantly more efficient than the widely adopted Softmax attention, while preserving global context modelling capability. Interestingly, we show that the proposed agent attention is equivalent to a generalized form of linear attention. Therefore, agent attention seamlessly integrates the powerful Softmax attention and the highly efficient linear attention. Extensive experiments demonstrate the effectiveness of agent attention with various vision Transformers and across diverse vision tasks, including image classification, object detection, semantic segmentation and image generation. Notably, agent attention has shown remarkable performance in high-resolution scenarios, owning to its linear attention nature. For instance, when applied to Stable Diffusion, our agent attention accelerates generation and substantially enhances image generation quality without any additional training. Code is available at <a class="link-external link-https" href="https://github.com/LeapLabTHU/Agent-Attention" rel="external noopener nofollow">this https URL</a>.

Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture

Enhancing Neural Machine Translation of Low-Resource Languages: Corpus Development, Human Evaluation and Explainable AI Architectures

Attention is All you Need

Lite Transformer with Long-Short Range Attention

Transformers for Low-Resource Languages:Is Féidir Linn!

Efficient Long-Range Transformers: You Need to Attend More, but Not Necessarily at Every Layer

Effective Approaches to Attention-based Neural Machine Translation

Efficient Machine Translation with a BiLSTM-Attention Approach

Neural Architecture Search on Efficient Transformers and Beyond

Alleviating the Inequality of Attention Heads for Neural Machine Translation

Adaptive Multi-Resolution Attention with Linear Complexity

X-Transformer: A Machine Translation Model Enhanced by the Self-Attention Mechanism

Attention Mechanism and Context Modeling System for Text Mining Machine Translation

Agent Attention: On the Integration of Softmax and Linear Attention

Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Gated Linear Attention Transformers with Hardware-Efficient Training

Only 5\% Attention Is All You Need: Efficient Long-range Document-level Neural Machine Translation

How Effective are State Space Models for Machine Translation?

Attention-via-Attention Neural Machine Translation

Relaxed Attention for Transformer Models