An adaptive guidance fusion network for RGB-D salient object detection

Haodong Sun,Yu Wang,Xinpeng Ma
DOI: https://doi.org/10.1007/s11760-023-02775-w
IF: 1.583
2023-12-01
Signal Image and Video Processing
Abstract:RGB-D salient object detection (RGB-D SOD) has currently attracted much attention for its prospect of broad application. On the basis of the "encoder-decoder" paradigm of the fully convolutional network (FCN), many FCN-based strategies have emerged and achieved huge progress, but underestimated the potential of level-specific characteristics of multi-modal features. In this paper, we propose the adaptive guided fusion network (AGFNet) to further mine the potential information between the depth image and the RGB image, and design an adaptive fusion and coarse-to-fine decoding strategy to achieve high-precision detection of salient objects. Specifically, we first use a two-stream encoder to extract the multi-level features of the RGB image and depth image but refrain from the previous practice of using depth features for each layer. Second, a simple but effective way named multi-modal selective fusion strategy is designed to fuse the multi-level features. Third, for enhancement of contextual information of each level adaptively, an adaptive cross fusion module (ACFM) fuses the features at all levels and outputs a coarse saliency map. Finally, a guided attention refinement module (GARM) utilizes the coarse saliency map to guide the final features from ACFM to realize the enhancement and obtain a refined saliency map. Our method is compared with other state-of-the-art RGB-SOD methods through extensive experiments, and the results demonstrate the superiority of our proposed AGFNet. The source code of this project is available at https://github.com/HaodongSun809/my_AGFNet.git.
engineering, electrical & electronic,imaging science & photographic technology
What problem does this paper attempt to address?