Abstract:Due to their imaging mechanisms and techniques, some depth images inevitably have low visual qualities or have some inconsistent foregrounds with their corresponding RGB images. Directly using such depth images will deteriorate the performance of RGB-D SOD. In view of this, a novel RGB-D salient object detection model is presented, which follows the principle of calibration-then-fusion to effectively suppress the influence of such two types of depth images on final saliency prediction. Specifically, the proposed model is composed of two stages, i.e., an image generation stage and a saliency reasoning stage. The former generates high-quality and foreground-consistent pseudo depth images via an image generation network. While the latter first calibrates the original depth information with the aid of those newly generated pseudo depth images and then performs cross-modal feature fusion for the final saliency reasoning. Especially, in the first stage, a Two-steps Sample Selection (TSS) strategy is employed to select such reliable depth images from the original RGB-D image pairs as supervision information to optimize the image generation network. Afterwards, in the second stage, a Feature Calibrating and Fusing Network (FCFNet) is proposed to achieve the calibration-then-fusion of cross-modal information for the final saliency prediction, which is achieved by a Depth Feature Calibration (DFC) module, a Shallow-level Feature Injection (SFI) module and a Multi-modal Multi-scale Fusion (MMF) module. Moreover, a loss function, i.e., Region Consistency Aware (RCA) loss, is presented as an auxiliary loss for FCFNet to facilitate the completeness of salient objects together with the reduction of background interference by considering the local regional consistency in the saliency maps. Experiments on six benchmark datasets demonstrate the superiorities of our proposed RGB-D SOD model over some state-of-the-arts.

CCAFNet: Crossflow and Cross-Scale Adaptive Fusion Network for Detecting Salient Objects in RGB-D Images.

Compensated Attention Feature Fusion and Hierarchical Multiplication Decoder Network for RGB-D Salient Object Detection

CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection

Feature Calibrating and Fusing Network for RGB-D Salient Object Detection

An adaptive guidance fusion network for RGB-D salient object detection

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection

Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection

Cross-Collaborative Fusion-Encoder Network for Robust RGB-Thermal Salient Object Detection.

HFMDNet: Hierarchical Fusion and Multilevel Decoder Network for RGB-D Salient Object Detection

MFCINet: multi-level feature and context information fusion network for RGB-D salient object detection

Middle-Level Feature Fusion for Lightweight RGB-D Salient Object Detection

Cascaded Cross-Modality Fusion Network for 3D Object Detection

Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

CAFCNet: Cross-modality asymmetric feature complement network for RGB-T salient object detection

AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection

Cross-modal refined adjacent-guided network for RGB-D salient object detection

C $^{2}$ DFNet: Criss-Cross Dynamic Filter Network for RGB-D Salient Object Detection

A Saliency Enhanced Feature Fusion based multiscale RGB-D Salient Object Detection Network

Feature interaction and two-stage cross-modal fusion for RGB-D salient object detection

Global-prior-guided fusion network for salient object detection