Abstract:The RGB-D salient object detection algorithm simulates human attention behavior and attempts to locate the most visually prominent object(s) from a set of RGB and depth images. Existing works often follow a deterministic decoding network, with few methods explicitly considering how to establish connections between features at various levels. To this end, we first propose a cascaded refined RGB-D salient object detection network based on the attention mechanism (CRNet), whose primary contribution is a cascaded refined upsampling network layout. Specifically, we have developed an adaptive channel transformation ratio α in the micro modification module of convolutional block attention (MM), adaptively adjusting the feature channel conversion ratio according to the original input depth feature level to maximize the integration of contextual information during the feature extraction phase. For the multi-modal feature interaction section, we propose a contextual feature aggregation module ( A CF ) consisting of separable convolution, dilated convolution, and adaptive averaging pooling. Extend multi-modal fused features' receptive fields, reduce redundant information, and decrease background noise interference. Furthermore, we first propose a cascaded refined upsampling network, a precise refining process that includes personal refinement, team expansion, and sequential execution operations. Among them, most of the actions were performed in a new sequential refinement module based on attention mechanism (SRM-Wm). We put the training of CRNet under the supervision of a new hybrid loss function. The experiment results show that the structure of our model is simple but very effective and outperforms the 19 SOTAs on six public datasets using four metrics ( 1.6% improvement in F-measure vs. the top-ranked model: BBSNet-TIP2021). You can find the code and results of our method at https://github.com/guanyuzong/CR-Net.

Specificity-preserving RGB-D Saliency Detection

Dynamic Selective Network for RGB-D Salient Object Detection

Towards accurate RGB-D saliency detection with complementary attention and adaptive integration

Depth Cue Enhancement and Guidance Network for RGB-D Salient Object Detection

RGB-D Salient Object Detection with Cross-Modality Modulation and Selection

DMRA: Depth-Induced Multi-Scale Recurrent Attention Network for RGB-D Saliency Detection

Salient Object Detection for RGBD Video Via Spatial Interaction and Depth-Based Boundary Refinement

Triple-Complementary Network for RGB-D Salient Object Detection

HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness

EF-Net: A Novel Enhancement and Fusion Network for RGB-D Saliency Detection

Deep RGB-D Saliency Detection with Depth-Sensitive Attention and Automatic Multi-Modal Fusion

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

RGB-D Saliency Detection via Depth Quality Perception and Hierarchical Feature Guidance

CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection

Cross-modal refined adjacent-guided network for RGB-D salient object detection

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search

Transformer-based Network for RGB-D Saliency Detection

RGB-D Salient Object Detection Method Based on Multi-Modal Fusion and Contour Guidance

A cascaded refined rgb-d salient object detection network based on the attention mechanism

Modality-Guided Subnetwork for Salient Object Detection