Abstract:Abstract Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.

What problem does this paper attempt to address?

The paper primarily discusses the attention mechanism in the field of computer vision and provides a comprehensive review of it. Its core objective is to summarize and classify the various attention methods currently used in computer vision. Specifically, the main contributions of the paper include: 1. **Systematic Review**: Provides a systematic review of visual attention methods, covering a unified description of attention mechanisms, the development history of visual attention mechanisms, and current research progress. 2. **Classification System**: Proposes a classification system based on data domains (such as channel, spatial, temporal, etc.) to classify attention methods, allowing these attention methods to be explored independently of specific application domains. 3. **Future Directions Suggestions**: Offers suggestions for the future development of research on visual attention mechanisms. The paper first defines a general form of the attention mechanism, where attention is viewed as a process of dynamically adjusting weights based on the importance of input features. Subsequently, the article provides a detailed introduction to different types of attention mechanisms, including but not limited to channel attention, spatial attention, temporal attention, etc., and delves into each type, analyzing their development history, motivations, formal representations, and functions. The paper particularly emphasizes the importance and potential of attention mechanisms in computer vision tasks, suggesting that attention-based models have the potential to replace convolutional neural networks and become a more powerful and general architecture in the field of computer vision. Additionally, the paper mentions some representative works, such as SENet, STN, Non-Local Networks, etc., and discusses their characteristics and contributions. In short, this paper aims to provide readers with a comprehensive perspective on attention mechanisms in computer vision through a systematic literature review and classification, and to point out directions for further research in this field.

Attention mechanisms in computer vision: A survey

Attention Mechanisms in Computer Vision: A Survey

Visual Attention Methods in Deep Learning: An In-Depth Survey

An Overview of the Attention Mechanisms in Computer Vision

Attention Mechanisms in Medical Image Segmentation: A Survey

Attention mechanisms and deep learning for machine vision: A survey of the state of the art

Selective Visual Attention: Computational Models and Applications

Human Vs Machine Attention in Neural Networks: A Comparative Study.

Attention mechanism in neural networks: where it comes and where it goes

A General Survey on Attention Mechanisms in Deep Learning

Attention-based CNNs for Image Classification: A Survey

Understanding More about Human and Machine Attention in Deep Neural Networks

Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work

Trends, Applications, and Challenges in Human Attention Modelling

Image Visual Attention Computation and Application Via the Learning of Object Attributes

The multi-modal fusion in visual question answering: a review of attention mechanisms

A review of visual sustained attention: neural mechanisms and computational models

Visual Attention driven by Convolutional Features

Attention, please! A survey of neural attention models in deep learning

Multimodal Continuous Visual Attention Mechanisms

Computational modelling of visual attention