Discriminative Cross-Modal Transfer Learning and Densely Cross-Level Feedback Fusion for RGB-D Salient Object Detection

Hao Chen,Youfu Li,Dan Su
DOI: https://doi.org/10.1109/tcyb.2019.2934986
IF: 11.8
2020-11-01
IEEE Transactions on Cybernetics
Abstract:This article addresses two key issues in RGB-D salient object detection based on the convolutional neural network (CNN). 1) How to bridge the gap between the "data-hungry" nature of CNNs and the insufficient labeled training data in the depth modality? 2) How to take full advantages of the complementary information among two modalities. To solve the first problem, we model the depth-induced saliency detection as a CNN-based cross-modal transfer learning problem. Instead of directly adopting the RGB CNN as initialization, we additionally train a modality classification network (MCNet) to encourage discriminative modality-specific representations in minimizing the modality classification loss. To solve the second problem, we propose a densely cross-level feedback topology, in which the cross-modal complements are combined in each level and then densely fed back to all shallower layers for sufficient cross-level interactions. Compared to traditional two-stream frameworks, the proposed one can better explore, select, and fuse cross-modal cross-level complements. Experiments show the significant and consistent improvements of the proposed CNN framework over other state-of-the-art methods.
automation & control systems,computer science, cybernetics, artificial intelligence
What problem does this paper attempt to address?