Abstract:Remarkable success has been made by deep convolutional neural network (CNN) models in semantic image segmentation. However, most segmentation models are based on classification networks which tend to learn image-level features and lost abundant spatial information due to repeated pooling and downsampling operations, and the CNN-based methods are not robust to inputs, hence directly applying existing segmentation methods to semantic video segmentation will result in spatially inconsecutive and temporally inconsistent segmentation predictions within one instance and of the same objects across adjacent frames, respectively. To tackle this challenge, we propose an Attention-Guided Network (AGNet) to adaptively strengthen inter-frame and intra-frame features for more precise segmentation predictions. Specifically, we append an adjacent attention module (AAM) and a spatial attention module (SAM) on the top of dilated fully convolutional network (FCN), which model the feature correlations in temporal and spatial dimensions, respectively. The AAM selectively enhances the inter-frame features of the same objects across adjacent frames for temporally consistent predictions. Meanwhile, the SAM selectively aggregates the intra-frame features within one instance for spatially consecutive predictions. Finally, we sum the outputs of the two attention modules to further improve feature representations which contribute to more precise segmentation predictions across temporal and spatial dimensions simultaneously. Extensive experiments demonstrate the effectiveness of the proposed method, obtaining state-of-the-art mean intersection of union (mIoU) of 75.22% on CamVid dataset.

Region-and-Attention Network for Semantic Segmentation

PPNet : Pooling Position Attention Network for Semantic Segmentation

Fully Attentional Network for Semantic Segmentation

RAANet: A Residual ASPP with Attention Framework for Semantic Segmentation of High-Resolution Remote Sensing Images

Rcanet: row-column attention network for semantic segmentation

Semantic Segmentation of Remote Sensing Image Based on Regional Self-Attention Mechanism

Scale Channel Attention Network for Image Segmentation

Realtime Global Attention Network for Semantic Segmentation

Attention-Guided Network for Semantic Video Segmentation

Scene Segmentation With Dual Relation-Aware Attention Network

SPANet: Successive Pooling Attention Network for Semantic Segmentation of Remote Sensing Images

Semantic Segmentation With Attention Mechanism for Remote Sensing Images

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation

MASANet: Multi-Angle Self-Attention Network for Semantic Segmentation of Remote Sensing Images

DPANET:Dual Pooling Attention Network for Semantic Segmentation

Global-Local Attention Network for Semantic Segmentation in Aerial Images.

Scale-aware Attention Network for Weakly Supervised Semantic Segmentation

Using Features Specifically: an Efficient Network for Scene Segmentation Based on Dedicated Attention Mechanisms

EANET: Efficient Attention-Augmented Network for Real-Time Semantic Segmentation.

AANet: Adaptive Attention Networks for Semantic Segmentation of High-Resolution Remote Sensing Imagery

Point Attention Network for Point Cloud Semantic Segmentation.