Cross-modal refined adjacent-guided network for RGB-D salient object detection
Hongbo Bi,Jiayuan Zhang,Ranwan Wu,Yuyu Tong,Wei Jin
DOI: https://doi.org/10.1007/s11042-023-14421-1
IF: 2.577
2023-03-22
Multimedia Tools and Applications
Abstract:RGB and depth modalities can be exploited to effectively recognize the most eye-catching objects in different scenes. Therefore, RGB-D salient object detection (RGB-D SOD) has been a popular direction focused by researchers. Particularly in recent years, various newfangled RGB-D SOD algorithms have been proposed endlessly and achieved outstanding performance. However, most approaches adopt the common pyramid structure to integrate multi-scale cues but ignore the complementarity of features in cross-layers. Besides, it is still challenging to fully utilize RGB and Depth information for cross-modal interaction. To compensate for these shortcomings, we propose a CRA-Net (Cross-modal Refined Adjacent-guided Network), which takes advantage of the high-level semantic information contained in the high layers to guide the details of the local characteristics in the low layers for improving detection accuracy. Specifically, a multiplier refinement module (MRM) is proposed to adequately carry out the information interaction between two modalities, in which a five-layer refinement mechanism is adopted to enhance cross-modal fusion representations. Moreover, for the purpose of obliterating the interference of non-significant factors in the low-level backgrounds, we design an adjacent-guided aggregation module (AAM). The multi-level features are fed in groups into two AAMs with identical structures. By utilizing an adjacent-layer guidance strategy to effectively guide multi-scale features assemblage from deep to shallow. Numerous experiments show that our CRA-Net is competitive for four common evaluation metrics on four popular datasets.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering