CNN-Based RGB-D Salient Object Detection: Learn, Select, and Fuse

Hao Chen,Youfu Li,Yongjian Deng,Guosheng Lin
DOI: https://doi.org/10.1007/s11263-021-01452-0
IF: 13.369
2021-05-05
International Journal of Computer Vision
Abstract:The goal of this work is to present a systematic solution for RGB-D salient object detection, which addresses the following three aspects with a unified framework: modal-specific representation learning, complementary cue selection, and cross-modal complement fusion. To learn discriminative modal-specific features, we propose a hierarchical cross-modal distillation scheme, in which we use the progressive predictions from the well-learned source modality to supervise learning feature hierarchies and inference in the new modality. To better select complementary cues, we formulate a residual function to incorporate complements from the paired modality adaptively. Furthermore, a top-down fusion structure is constructed for sufficient cross-modal cross-level interactions. The experimental results demonstrate the effectiveness of the proposed cross-modal distillation scheme in learning from a new modality, the advantages of the proposed multi-modal fusion pattern in selecting and fusing cross-modal complements, and the generalization of the proposed designs in different tasks.
computer science, artificial intelligence
What problem does this paper attempt to address?