Abstract:Salient Object Detection (SOD) is gradually applied in natural scene images. However, due to the apparent differences between optical remote sensing images and natural scene images, directly applying the SOD of natural scene images to optical remote sensing images has limited performance in global context information. Therefore, salient object detection in optical remote sensing images (ORSI-SOD) is challenging. Optical remote sensing images usually have large-scale variations. However, the vast majority of networks are based on Convolutional Neural Network (CNN) backbone networks such as VGG and ResNet, which can only extract local features. To address this problem, we designed a new model that employs a transformer-based backbone network capable of extracting global information and remote dependencies. A new framework is proposed for this question, named Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images (GMANet). In this framework, the Pyramid Vision Transformer (PVT) is an encoder to catch remote dependencies. A Multiscale Attention Module (MAM) is introduced for extracting multiscale information. Meanwhile, a Global Guiled Brach (GGB) is used to learn the global context information and obtain the complete structure. Four MAMs are densely connected to this GGB. The Aggregate Refinement Module (ARM) is used to enrich the details of edge and low-level features. The ARM fuses global context information and encoder multilevel features to complement the details while the structure is complete. Extensive experiments on two public datasets show that our proposed framework GMANet outperforms 28 state-of-the-art methods on six evaluation metrics, especially E-measure and F-measure. It is because we apply a coarse-to-fine strategy to merge global context information and multiscale information.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenge of salient object detection in optical remote - sensing images (ORSI - SOD). Specifically, there are significant differences between optical remote - sensing images and natural - scene images. When directly applying the salient object detection methods for natural - scene images to optical remote - sensing images, the performance is limited, especially in terms of global context information. ### Main problems include: 1. **Insufficient global context information**: Traditional methods based on convolutional neural networks (CNNs), such as VGG and ResNet, can only extract local features and are unable to capture long - range dependencies, resulting in false detections, omissions, and inaccuracies in salient object detection. 2. **Large multi - scale variation**: The size of objects in optical remote - sensing images varies greatly, and traditional methods have difficulty effectively dealing with this scale variation. 3. **High background complexity**: The background of optical remote - sensing images usually contains complex textures and structures, which are more complex than those of natural - scene images, increasing the detection difficulty. ### Solutions: To solve the above problems, the author proposes a new framework named "Global and Multi - scale Aggregation Network" (GMANet). The main features of GMANet are as follows: 1. **Using Pyramid Vision Transformer (PVT - v2) as an encoder**: PVT - v2 can extract global information and long - range dependencies, making up for the deficiency of CNNs in global context. 2. **Introducing the Multi - scale Attention Module (MAM)**: MAM extracts multi - scale features by combining convolution kernels of different sizes, solving the problem of large - scale variation. 3. **Designing the Global Guidance Branch (GGB)**: GGB is used to learn global context information and generate complete structural information through four densely - connected MAM modules. 4. **Proposing the Aggregation Refinement Module (ARM)**: ARM fuses global guidance information and low - level features through a coarse - to - fine strategy to ensure the accurate positioning of salient objects and enhance details. ### Summary: GMANet significantly improves the accuracy and integrity of salient object detection in optical remote - sensing images by combining global context information, multi - scale features, and dense connections. The experimental results show that GMANet outperforms 28 existing state - of - the - art methods in multiple evaluation metrics, especially in E - measure and F - measure.

Global and Multiscale Aggregate Network for Saliency Object Detection in Optical Remote Sensing Images

Multi-scale Feature Aggregation Network for Salient Object Detection in Optical Remote Sensing Images

Multilevel Interactive Reverse-Guided Network for Salient Object Detection in Optical Remote Sensing Images

Improvement of Subcutaneous Nutritional Blood Flow in the Forefoot by Hydroxyethylrutosides in Patients with Arterial Insufficiency: Case Studies

Multiscale and Multidimensional Weighted Network for Salient Object Detection in Optical Remote Sensing Images

Salient Object Detection in Optical Remote Sensing Images Based on Global Context Mixed Attention

Global Perception Network for Salient Object Detection in Remote Sensing Images

Multiscale Feature Enhancement Network for Salient Object Detection in Optical Remote Sensing Images

Global Semantic-Sense Aggregation Network for Salient Object Detection in Remote Sensing Images

Global–Local Semantic Interaction Network for Salient Object Detection in Optical Remote Sensing Images With Scribble Supervision

SGMFNet: a remote sensing image object detection network based on spatial global attention and multi-scale feature fusion

Multiattention Network for Semantic Segmentation of Fine-Resolution Remote Sensing Images

Object Detection Based on Saturation of Visual Perception

Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images

MFCANet: Multiscale Feature Context Aggregation Network for Oriented Object Detection in Remote-Sensing Images

RRNet: Relational Reasoning Network With Parallel Multiscale Attention for Salient Object Detection in Optical Remote Sensing Images

Semantic-Guided Attention Refinement Network for Salient Object Detection in Optical Remote Sensing Images

United Domain Cognition Network for Salient Object Detection in Optical Remote Sensing Images

CRNet: Channel-Enhanced Remodeling-Based Network for Salient Object Detection in Optical Remote Sensing Images

Object Detection Based on Global-Local Saliency Constraint in Aerial Images

A Deep Multiscale Fusion Method via Low-Rank Sparse Decomposition for Object Saliency Detection Based on Urban Data in Optical Remote Sensing Images