Learnable Depth-Sensitive Attention for Deep RGB-D Saliency Detection with Multi-modal Fusion Architecture Search
Peng Sun,Wenhu Zhang,Songyuan Li,Yilin Guo,Congli Song,Xi Li
DOI: https://doi.org/10.1007/s11263-022-01646-0
IF: 13.369
2022-09-08
International Journal of Computer Vision
Abstract:RGB-D salient object detection (SOD) is usually formulated as a problem of classification or regression over two modalities, i.e. , RGB and depth. Hence, effective RGB-D feature modeling and multi-modal feature fusion both play a vital role in RGB-D SOD. In this paper, we propose a depth-sensitive RGB feature modeling scheme using the depth-wise geometric prior of salient objects. In principle, the feature modeling scheme is carried out in a Depth-Sensitive Attention Module (DSAM), which leads to the RGB feature enhancement as well as the background distraction reduction by capturing the depth geometry prior. Furthermore, we extend and enhance the original DSAM to DSAMv2 by proposing a novel Depth Attention Generation Module (DAGM) to generate learnable depth attention maps for more robust depth-sensitive RGB feature extraction. Moreover, to perform effective multi-modal feature fusion, we further present an automatic neural architecture search approach for RGB-D SOD, which does well in finding out a feasible architecture from our specially designed multi-modal multi-scale search space. Extensive experiments on nine standard benchmarks have demonstrated the effectiveness of the proposed approach against the state-of-the-art. We name the enhanced learnable D epth- S ensitive A ttention and A utomatic multi-modal F usion framework DSA Fv2.
computer science, artificial intelligence