Abstract:Salient object detection in RGB-D images aims to identify the most attractive objects in a pair of color and depth images for the observer. As an important branch of salient object detection, it focuses on solving the following two major challenges: how to achieve cross-modal fusion that is efficient and beneficial for salient object detection; how to effectively extract the information of depth image with relatively poor quality. This paper proposes a cross-modal adaptive gated fusion generative adversarial network for RGB-D salient object detection by using color and depth images. Specifically, the generator network adopts double-stream encoder-decoder network and receives RGB and depth images at the same time. The proposed depthwise separable residual convolution module is used to deal with deep semantic information, and the processed feature is combined with side-output features of the encoder network progressively. In order to compensate for the shortcoming of poor quality of the depth image, the proposed method adds the cross-modal guidance from the side-output features of the RGB stream to the decoder network of depth stream. The discriminator network adaptively fuses the features of double streams using a gated fusion module, then sends the gated fusion saliency map to the discriminator to distinguish the similarity from ground-truth map. Adversarial learning forms the better generator network and discriminator network, and the gated fusion saliency map generated by the best generator network is served as final result. Experiments on five publicly RGB-D datasets demonstrate the effect of cross-modal fusion, depthwise separable residual convolution and adaptive gated fusion. Compared with the state-of-the-art methods, our method achieves the better performance.

Attentive Cross-Modal Fusion Network for RGB-D Saliency Detection

FCMNet: Frequency-aware Cross-Modality Attention Networks for RGB-D Salient Object Detection

CMCLNet Cross-Modality Attention Fusion and Cross-Level Feature Interaction for RGBD salient object detection

RGB-D Salient Object Detection Based on Cross-Modal and Cross-Level Feature Fusion

AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection

Learning Selective Mutual Attention and Contrast for RGB-D Saliency Detection

A Cross-Modal Adaptive Gated Fusion Generative Adversarial Network for RGB-D Salient Object Detection

Cross-modal Attention Fusion Network for RGB-D Semantic Segmentation

Adaptive Fusion for RGB-D Salient Object Detection.

Dual Attention Guided Multi-Scale Fusion Network for RGB-D Salient Object Detection

RGB-D Salient Object Detection Method Based on Multi-Modal Fusion and Contour Guidance

RGB-D salient object detection via cross-modal joint feature extraction and low-bound fusion loss

Cross-modal and Cross-level Attention Interaction Network for Salient Object Detection

Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection

CFRNet: Cross-Attention-Based Fusion and Refinement Network for Enhanced RGB-T Salient Object Detection

Feature Enhancement and Fusion for RGB-T Salient Object Detection

AGRFNet: Two-stage Cross-Modal and Multi-Level Attention Gated Recurrent Fusion Network for RGB-D Saliency Detection

Compensated Attention Feature Fusion and Hierarchical Multiplication Decoder Network for RGB-D Salient Object Detection

RGB-D Saliency Detection with 3D Cross-modal Fusion and Mid-level Integration.

Multi-modality information refinement fusion network for RGB-D salient object detection

JL-DCF: Joint Learning and Densely-Cooperative Fusion Framework for RGB-D Salient Object Detection