Attention-based Dual Context Aggregation for Image Semantic Segmentation

Zhao Dexin,Qi Zhiyang,Yang Ruixue,Wang Zhaohui
DOI: https://doi.org/10.1007/s11042-021-11094-6
IF: 2.577
2021-01-01
Multimedia Tools and Applications
Abstract:Recent works have extensively probed contextual relevance to enhance the scene understanding. However, most approaches tend to model the relationships between local regions due to the limitation of the convolution kernel, while rarely exploring long-range dependencies. In this paper, we come up with the Dual Context Aggregation Module (DCM) to effectively capture such important information. DCM splits into two attention modules to obtain dense contextual information via modeling relations between positions and channels. The spatial attention module generates huge attention maps by constructing pairwise relationships between positions in the same row or column. The channel attention module applies the Weight Calibrate Block to generate weights for all the channels to effectively get the correlation between different channels. We adopt an element addition to integrate the feature maps of the two modules. Moreover, we design a two-step decoder module to improve the feature representation. On the basis of these developments, we construct the Dual Context aggregation Network (DCNet). Extensive evaluation experiments on the benchmarks prove that our model leads to robust feature representation. Our method demonstrates competitive performance compared to state-of-the-art models, achieving the MIoU scores of 81.9% on Cityscapes and 45.54% on ADE20K.
What problem does this paper attempt to address?