Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images

Lina Huo,Jiayue Hou,Jie Feng,Wei Wang,Jinsheng Liu
DOI: https://doi.org/10.3390/rs16040624
IF: 5
2024-02-08
Remote Sensing
Abstract:Salient Object Detection (SOD) is gradually applied in natural scene images. However, due to the apparent differences between optical remote sensing images and natural scene images, directly applying the SOD of natural scene images to optical remote sensing images has limited performance in global context information. Therefore, salient object detection in optical remote sensing images (ORSI-SOD) is challenging. Optical remote sensing images usually have large-scale variations. However, the vast majority of networks are based on Convolutional Neural Network (CNN) backbone networks such as VGG and ResNet, which can only extract local features. To address this problem, we designed a new model that employs a transformer-based backbone network capable of extracting global information and remote dependencies. A new framework is proposed for this question, named Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images (GMANet). In this framework, the Pyramid Vision Transformer (PVT) is an encoder to catch remote dependencies. A Multiscale Attention Module (MAM) is introduced for extracting multiscale information. Meanwhile, a Global Guiled Brach (GGB) is used to learn the global context information and obtain the complete structure. Four MAMs are densely connected to this GGB. The Aggregate Refinement Module (ARM) is used to enrich the details of edge and low-level features. The ARM fuses global context information and encoder multilevel features to complement the details while the structure is complete. Extensive experiments on two public datasets show that our proposed framework GMANet outperforms 28 state-of-the-art methods on six evaluation metrics, especially E-measure and F-measure. It is because we apply a coarse-to-fine strategy to merge global context information and multiscale information.
environmental sciences,imaging science & photographic technology,remote sensing,geosciences, multidisciplinary
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenge of salient object detection in optical remote - sensing images (ORSI - SOD). Specifically, there are significant differences between optical remote - sensing images and natural - scene images. When directly applying the salient object detection methods for natural - scene images to optical remote - sensing images, the performance is limited, especially in terms of global context information. ### Main problems include: 1. **Insufficient global context information**: Traditional methods based on convolutional neural networks (CNNs), such as VGG and ResNet, can only extract local features and are unable to capture long - range dependencies, resulting in false detections, omissions, and inaccuracies in salient object detection. 2. **Large multi - scale variation**: The size of objects in optical remote - sensing images varies greatly, and traditional methods have difficulty effectively dealing with this scale variation. 3. **High background complexity**: The background of optical remote - sensing images usually contains complex textures and structures, which are more complex than those of natural - scene images, increasing the detection difficulty. ### Solutions: To solve the above problems, the author proposes a new framework named "Global and Multi - scale Aggregation Network" (GMANet). The main features of GMANet are as follows: 1. **Using Pyramid Vision Transformer (PVT - v2) as an encoder**: PVT - v2 can extract global information and long - range dependencies, making up for the deficiency of CNNs in global context. 2. **Introducing the Multi - scale Attention Module (MAM)**: MAM extracts multi - scale features by combining convolution kernels of different sizes, solving the problem of large - scale variation. 3. **Designing the Global Guidance Branch (GGB)**: GGB is used to learn global context information and generate complete structural information through four densely - connected MAM modules. 4. **Proposing the Aggregation Refinement Module (ARM)**: ARM fuses global guidance information and low - level features through a coarse - to - fine strategy to ensure the accurate positioning of salient objects and enhance details. ### Summary: GMANet significantly improves the accuracy and integrity of salient object detection in optical remote - sensing images by combining global context information, multi - scale features, and dense connections. The experimental results show that GMANet outperforms 28 existing state - of - the - art methods in multiple evaluation metrics, especially in E - measure and F - measure.