Meng-Hao Guo,Tian-Xing Xu,Jiang-Jiang Liu,Zheng-Ning Liu,Peng-Tao Jiang,Tai-Jiang Mu,Song-Hai Zhang,Ralph R. Martin,Ming-Ming Cheng,Shi-Min Hu
Abstract:Abstract Humans can naturally and effectively find salient regions in complex scenes. Motivated by this observation, attention mechanisms were introduced into computer vision with the aim of imitating this aspect of the human visual system. Such an attention mechanism can be regarded as a dynamic weight adjustment process based on features of the input image. Attention mechanisms have achieved great success in many visual tasks, including image classification, object detection, semantic segmentation, video understanding, image generation, 3D vision, multimodal tasks, and self-supervised learning. In this survey, we provide a comprehensive review of various attention mechanisms in computer vision and categorize them according to approach, such as channel attention, spatial attention, temporal attention, and branch attention; a related repository https://github.com/MenghaoGuo/Awesome-Vision-Attentions is dedicated to collecting related work. We also suggest future directions for attention mechanism research.
What problem does this paper attempt to address?
The paper primarily discusses the attention mechanism in the field of computer vision and provides a comprehensive review of it. Its core objective is to summarize and classify the various attention methods currently used in computer vision. Specifically, the main contributions of the paper include:
1. **Systematic Review**: Provides a systematic review of visual attention methods, covering a unified description of attention mechanisms, the development history of visual attention mechanisms, and current research progress.
2. **Classification System**: Proposes a classification system based on data domains (such as channel, spatial, temporal, etc.) to classify attention methods, allowing these attention methods to be explored independently of specific application domains.
3. **Future Directions Suggestions**: Offers suggestions for the future development of research on visual attention mechanisms.
The paper first defines a general form of the attention mechanism, where attention is viewed as a process of dynamically adjusting weights based on the importance of input features. Subsequently, the article provides a detailed introduction to different types of attention mechanisms, including but not limited to channel attention, spatial attention, temporal attention, etc., and delves into each type, analyzing their development history, motivations, formal representations, and functions.
The paper particularly emphasizes the importance and potential of attention mechanisms in computer vision tasks, suggesting that attention-based models have the potential to replace convolutional neural networks and become a more powerful and general architecture in the field of computer vision. Additionally, the paper mentions some representative works, such as SENet, STN, Non-Local Networks, etc., and discusses their characteristics and contributions.
In short, this paper aims to provide readers with a comprehensive perspective on attention mechanisms in computer vision through a systematic literature review and classification, and to point out directions for further research in this field.