Scene Segmentation With Dual Relation-Aware Attention Network

Jun Fu,Jing Liu,Jie Jiang,Yong Li,Yongjun Bao,Hanqing Lu

DOI: https://doi.org/10.1109/tnnls.2020.3006524

IF: 14.255

2021-06-01

IEEE Transactions on Neural Networks and Learning Systems

Abstract:In this article, we propose a Dual Relation-aware Attention Network (DRANet) to handle the task of scene segmentation. How to efficiently exploit context is essential for pixel-level recognition. To address the issue, we adaptively capture contextual information based on the relation-aware attention mechanism. Especially, we append two types of attention modules on the top of the dilated fully convolutional network (FCN), which model the contextual dependencies in spatial and channel dimensions, respectively. In the attention modules, we adopt a self-attention mechanism to model semantic associations between any two pixels or channels. Each pixel or channel can adaptively aggregate context from all pixels or channels according to their correlations. To reduce the high cost of computation and memory caused by the abovementioned pairwise association computation, we further design two types of compact attention modules. In the compact attention modules, each pixel or channel is built into association only with a few numbers of gathering centers and obtains corresponding context aggregation over these gathering centers. Meanwhile, we add a cross-level gating decoder to selectively enhance spatial details that boost the performance of the network. We conduct extensive experiments to validate the effectiveness of our network and achieve new state-of-the-art segmentation performance on four challenging scene segmentation data sets, i.e., Cityscapes, ADE20K, PASCAL Context, and COCO Stuff data sets. In particular, a Mean IoU score of 82.9% on the Cityscapes test set is achieved without using extra coarse annotated data.

computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, hardware & architecture

What problem does this paper attempt to address?

The paper attempts to address the problem of how to efficiently utilize contextual information in the task of scene segmentation. Specifically, the goal of scene segmentation is to assign each pixel in an image to different semantic categories, including objects (such as people, cars, bicycles) and backgrounds (such as sky, road, grass). Due to significant variations in scale, occlusion, and lighting conditions between objects and backgrounds, pixel-level recognition becomes highly challenging. To address these issues, the authors propose a framework called Dual Relation-aware Attention Network (DRANet). By introducing a relation-aware attention mechanism, DRANet can adaptively capture contextual information in both spatial and channel dimensions, thereby improving pixel-level recognition accuracy. The main contributions are as follows: 1. **Propose DRANet**: Achieve accurate pixel-level recognition by modeling contextual dependencies in spatial and channel dimensions. 2. **Design compact attention modules**: Propose two types of compact attention modules (CPAM and CCAM) that enhance performance while reducing computational and memory overhead. 3. **Simple decoder structure**: Design a decoder with a cross-level gating mechanism to enhance low-level features, highlight spatial details, and achieve more accurate predictions. 4. **Experimental validation**: Conduct extensive experiments on four challenging scene segmentation datasets (Cityscapes, ADE20K, PASCAL Context, and COCO Stuff), achieving new state-of-the-art performance. Through these designs, DRANet not only effectively captures rich contextual information but also significantly improves the performance of scene segmentation tasks.

Scene Segmentation With Dual Relation-Aware Attention Network

EHANet: Efficient Hybrid Attention Network Towards Real-time Semantic Segmentation

Deep Dual-Stream Network with Scale Context Selection Attention Module for Semantic Segmentation

ACNET: Attention Based Network to Exploit Complementary Features for RGBD Semantic Segmentation.

MEDANet: More Efficient Dual Attention Network for Scene Segmentation

Adaptive multi-scale dual attention network for semantic segmentation

DPANET:Dual Pooling Attention Network for Semantic Segmentation

Attention Guided Global Enhancement and Local Refinement Network for Semantic Segmentation

Dense Relation Network: Learning Consistent and Context-Aware Representation for Semantic Image Segmentation

TCANet: three-stream coordinate attention network for RGB-D indoor semantic segmentation

AANet: Adaptive Attention Networks for Semantic Segmentation of High-Resolution Remote Sensing Imagery

Bilateral Network with Residual U-blocks and Dual-Guided Attention for Real-time Semantic Segmentation

DARSegNet: A Real-Time Semantic Segmentation Method Based on Dual Attention Fusion Module and Encoder-Decoder Network

IDRNet: Intervention-Driven Relation Network for Semantic Segmentation

Semantic segmentation of remote sensing images based on dual‐channel attention mechanism

Attention-based Dual Context Aggregation for Image Semantic Segmentation

DWRSeg: Rethinking Efficient Acquisition of Multi-scale Contextual Information for Real-time Semantic Segmentation

Dual Graph Convolutional Network for Semantic Segmentation.

DRBANET: A Lightweight Dual-Resolution Network for Semantic Segmentation with Boundary Auxiliary

DOCNet: Dual-Domain Optimized Class-Aware Network for Remote Sensing Image Segmentation

Attention-Guided Unified Network for Panoptic Segmentation