Abstract:RGB image and depth map (RGB-D)-based salient object detection (SOD) has been well-studied in recent years, especially using deep neural networks. An RGB image provides rich local and semantic features, while the depth map provides global structural information. Many researchers have treated depth information as a supplement to RGB maps. However, depth maps in various datasets are not as precise as RGB information, as they are captured under various conditions. Therefore, thoroughly exploiting these features at different levels remains unresolved. Many cognitive theories, such as the topological perception theory, claim that global properties are prior to local ones and are important for human recognition. In this paper, we propose a novel global-prior-guided fusion network with global-prior extraction modules to fuse cross-modality features. Each module contains a cross attention guided by deeper global priors, and the global prior extracted by this module is used to guide the processing of local features in shallow layers. The global guided network first integrates the local and global cross features into the decoder of depth maps, and then the fused structural features of the decoder are finally fused into the saliency decoder. Experimental results show that our method outperformed other state-of-the-art methods in the RGB-D-based SOD task on seven datasets (i.e., DUT-RGBD, NJUD, LFSD, NLPR, RGBD135, SIP, and STERE) and in terms of most metrics. To thoroughly exploit the modules we designed, we extended our model to accomplish the tasks of RGB and video SOD with slight adaptions, and obtained results comparable to those of the state-of-the-art (SOTA) methods in both fields.

DIG: Dual Interaction and Guidance Network for Salient Object Detection

Feature interaction and two-stage cross-modal fusion for RGB-D salient object detection

Towards a Complete and Detail-Preserved Salient Object Detection

MFCINet: multi-level feature and context information fusion network for RGB-D salient object detection

Double Cross-Modality Progressively Guided Network for RGB-D Salient Object Detection

Multi-Modal Salient Feature Enhance for Rgb-T Salient Object Detection

Encoder Deep Interleaved Network with Multi-Scale Aggregation for RGB-D Salient Object Detection

Depth-Induced Gap-Reducing Network for RGB-D Salient Object Detection: an Interaction, Guidance and Refinement Approach

CEMINet: Context exploration and multi-level interaction network for salient object detection

Global-prior-guided fusion network for salient object detection

HODINet: High-Order Discrepant Interaction Network for RGB-D Salient Object Detection

Deep Feature Filtering and Contextual Information Gathering Network for RGB-D Salient Object Detection

Global Guidance-Based Integration Network for Salient Object Detection in Low-Light Images

Dual Attention Guided Multi-Scale Fusion Network for RGB-D Salient Object Detection

Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection

Dual-Stream Network Based on Global Guidance for Salient Object Detection

Feature Specific Progressive Improvement for Salient Object Detection

Cross-modality Salient Object Detection Network with Universality and Anti-Interference.

CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection

DAGCN: Dynamic and Adaptive Graph Convolutional Network for Salient Object Detection.

Dual-Stream Feature Collaboration Perception Network for Salient Object Detection in Remote Sensing Images