Abstract:The existing salient object detection in optical remote sensing images methods mostly employ the same strategy to handle features at different levels without fully considering the distinct characteristics these features possess at various levels. This results in the neglect of some high-level semantics and low-level details during the feature extraction process. Furthermore, existing methods often rely on simple convolution operations to construct modules for feature extraction and fusion. Due to the inherent locality of convolution operations, these models are limited in their performance. To address these challenges, we propose a novel progressive complementation network with semantics and details (SDPCNet) consisting of three parts: Deep semantics aggregation module (DSAM), semantics-guided feature complement module (SFCM), and detail feature enhancement module (DFEM). Specifically, the DSAM is applied on the two highest-level features, guided by the global view with global long-range dependencies and local context generated by transformer and dilated convolution. The DSAM deeply delves the semantic information in high-level features to perceive the object positions and alleviate the adverse effects of cluttered backgrounds. The SFCM operates on the intermediate two levels of features, performing global correlation modeling on the aggregated cross-level features. It enhances multiscale semantic information and edge details using multiple sets of dilated convolutions to address the challenges posed by the uncertainty in the size and number of salient objects. The DFEM acts on the lowest two levels of features, enhancing edge details in spatial dimension and emphasizing semantics in different channel dimensions. It is then fused with high-level features to augment feature diversity and reduce the impact of background noise. Extensive experiments conducted on the ORSSD, EORSSD, and ORSI-4199 datasets demonstrate that our proposed SDPCNet outperforms 23 state-of-the-art methods across eight evaluation metrics.

A Unified Video Semantics Extraction and Noise Object Suppression Network for Video Saliency Detection

Salient Object Detection Based on Visual Perceptual Saturation and Two-Stream Hybrid Networks.

Semantic-aware Contrastive Learning with Proposal Suppression for Video Semantic Role Grounding

PSNet: Parallel Symmetric Network for Video Salient Object Detection

A Novel Video Salient Object Detection Method via Semi-supervised Motion Quality Perception

Self Supervised Progressive Network for High Performance Video Object Segmentation

LeNo: Adversarial Robust Salient Object Detection Networks with Learnable Noise

Learning Spatial-Semantic Features for Robust Video Object Segmentation

Video Saliency Detection Using Object Proposals

Progressive Complementation Network With Semantics and Details for Salient Object Detection in Optical Remote Sensing Images

Video Salient Object Detection via Fully Convolutional Networks

Wnet: Audio-Guided Video Object Segmentation Via Wavelet-Based Cross- Modal Denoising Networks

A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

Multi-Stream Attention-Aware Graph Convolution Network for Video Salient Object Detection

Video Salient Object Detection via Contrastive Features and Attention Modules

A Unified Two-Stage Group Semantics Propagation and Contrastive Learning Network for Co-Saliency Detection

Co-saliency Detection with Intra-Group Two-Stage Group Semantics Propagation and Inter-Group Contrastive Learning

Towards Robust Video Object Segmentation with Adaptive Object Calibration

RANet: Ranking Attention Network for Fast Video Object Segmentation

Part-aware attention correctness for video salient object detection

Video object segmentation via couple streams and feature memory