SACANet: scene-aware class attention network for semantic segmentation of remote sensing images

Xiaowen Ma,Rui Che,Tingfeng Hong,Mengting Ma,Ziyan Zhao,Tian Feng,Wei Zhang
2023-04-22
Abstract:Spatial attention mechanism has been widely used in semantic segmentation of remote sensing images given its capability to model long-range dependencies. Many methods adopting spatial attention mechanism aggregate contextual information using direct relationships between pixels within an image, while ignoring the scene awareness of pixels (i.e., being aware of the global context of the scene where the pixels are located and perceiving their relative positions). Given the observation that scene awareness benefits context modeling with spatial correlations of ground objects, we design a scene-aware attention module based on a refined spatial attention mechanism embedding scene awareness. Besides, we present a local-global class attention mechanism to address the problem that general attention mechanism introduces excessive background noises while hardly considering the large intra-class variance in remote sensing images. In this paper, we integrate both scene-aware and class attentions to propose a scene-aware class attention network (SACANet) for semantic segmentation of remote sensing images. Experimental results on three datasets show that SACANet outperforms other state-of-the-art methods and validate its effectiveness. Code is available at <a class="link-external link-https" href="https://github.com/xwmaxwma/rssegmentation" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper aims to address two main issues in semantic segmentation of remote sensing images: 1. **Insufficient Scene Awareness**: Traditional spatial attention mechanisms, while capturing direct relationships between pixels, overlook the understanding of pixels in the context of the global scene background (i.e., the global contextual information of the scene where the pixels are located and their relative positions). To improve this, the authors propose a module based on an improved spatial attention mechanism embedded with scene awareness (SAA), effectively utilizing the spatial correlation of ground objects. 2. **Large Intra-class Variation and Background Noise**: Remote sensing images typically have complex backgrounds and significant intra-class differences. Conventional attention mechanisms, due to dense similarity calculations, tend to introduce excessive background noise and struggle to handle intra-class variability. To address this, the researchers introduce a Local-global Class Attention mechanism, which associates pixels with global class representations through local class representations as intermediary elements, thereby achieving efficient and accurate class-level context modeling. Combining the above two points, the authors propose a network that integrates scene awareness and class attention—SACANet, to improve the performance of semantic segmentation in remote sensing images. Experimental results show that SACANet outperforms existing state-of-the-art methods on three benchmark datasets and achieves a good balance between accuracy and efficiency.