Semantic Segmentation Based on Vision Transformer Via Interactive Attention

Tao Qiu,Yu Xiao,Qi Yang,Xinqi Jiang,Taiping Zhang
DOI: https://doi.org/10.1109/smc53992.2023.10394249
2023-01-01
Abstract:Semantic segmentation is a fundamental task in the computer vision community that aims to achieve pixel-wise classification of images. Convolutional Neural Networks (CNNs) have been the backbone of typical semantic segmentation methods. However, the recent success of the Transformer architecture in natural language processing has led to its application in the field of image semantic segmentation. These methods mainly focus on learning more effective information through the encoder, while paying less attention to the decoder. In this paper, we propose a novel attention-based decoder module called the Attention In Attention (AIA) module. This module employs interactive attention to extract spatial and channel information and dynamically determine feature importance. Additionally, we propose the Feature Position Offset Estimation Module (FPOEM) to mitigate feature misalignment when features of different scales are fused. Experiments on two datasets, Cityscapes and ADE20K, show that the method proposed in this paper achieves state-of-the-art performance.
What problem does this paper attempt to address?