Abstract:Due to their imaging mechanisms and techniques, some depth images inevitably have low visual qualities or have some inconsistent foregrounds with their corresponding RGB images. Directly using such depth images will deteriorate the performance of RGB-D SOD. In view of this, a novel RGB-D salient object detection model is presented, which follows the principle of calibration-then-fusion to effectively suppress the influence of such two types of depth images on final saliency prediction. Specifically, the proposed model is composed of two stages, i.e., an image generation stage and a saliency reasoning stage. The former generates high-quality and foreground-consistent pseudo depth images via an image generation network. While the latter first calibrates the original depth information with the aid of those newly generated pseudo depth images and then performs cross-modal feature fusion for the final saliency reasoning. Especially, in the first stage, a Two-steps Sample Selection (TSS) strategy is employed to select such reliable depth images from the original RGB-D image pairs as supervision information to optimize the image generation network. Afterwards, in the second stage, a Feature Calibrating and Fusing Network (FCFNet) is proposed to achieve the calibration-then-fusion of cross-modal information for the final saliency prediction, which is achieved by a Depth Feature Calibration (DFC) module, a Shallow-level Feature Injection (SFI) module and a Multi-modal Multi-scale Fusion (MMF) module. Moreover, a loss function, i.e., Region Consistency Aware (RCA) loss, is presented as an auxiliary loss for FCFNet to facilitate the completeness of salient objects together with the reduction of background interference by considering the local regional consistency in the saliency maps. Experiments on six benchmark datasets demonstrate the superiorities of our proposed RGB-D SOD model over some state-of-the-arts.

Learning RGB-D Salient Object Detection Using Background Enclosure, Depth Contrast, and Top-Down Features

Depth Cue Enhancement and Guidance Network for RGB-D Salient Object Detection

Depth incorporating with color improves salient object detection

RGB-T Salient Object Detection Via Fusing Multi-level CNN Features.

Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection

CNNs-Based RGB-D Saliency Detection Via Cross-View Transfer and Multiview Fusion.

CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse

HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness

Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images

Deep Contrast Learning for Salient Object Detection

Saliency Detection by Forward and Backward Cues in Deep-Cnn

Delving into Calibrated Depth for Accurate RGB-D Salient Object Detection

Feature Calibrating and Fusing Network for RGB-D Salient Object Detection

Detecting Humans in RGB-D Data with CNNs

CIR-Net: Cross-Modality Interaction and Refinement for RGB-D Salient Object Detection

JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection

RGB-D Salient Object Detection with Ubiquitous Target Awareness

Coordinate Attention Filtering Depth-Feature Guide Cross-Modal Fusion RGB-Depth Salient Object Detection

Dynamic Selective Network for RGB-D Salient Object Detection

Adaptive Fusion for RGB-D Salient Object Detection.