Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection

Zhengxuan Xie,Feng Shao,Gang Chen,Hangwei Chen,Qiuping Jiang,Xiangchao Meng,Yo-Sung Ho
DOI: https://doi.org/10.1109/tcsvt.2023.3241196
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:RGB-T salient object detection (SOD) aims to detect and segment saliency regions on RGB images and the corresponding thermal maps. The ability of alleviating the modality difference between RGB and thermal modality plays a vital role in the development of RGB-T SOD. However, most of the existing methods try to integrate multi-modal information through various fusion strategies, or reduce the modality difference via unidirectional or undifferentiated bidirectional interaction, but failing in some challenging scenes. To deal with the above question, a novel Cross-Modality Double Bidirectional Interaction and Fusion Network (CMDBIF-Net) for RGB-T SOD is proposed. Specifically, we construct an interactive branch to indirectly bridge the RGB and thermal modalities. In addition, we propose a double bidirectional interaction (DBI) module composed of a forward interaction block (FIB) and a backward interaction block (BIB) to reduce the cross-modality differences. Moreover, a multi-scale feature enhancement and fusion (MSFEF) module is introduced to integrate the multi-modal features with considering the internal gap of different modality. Finally, we use a cascaded decoder and a cross-level feature enhancement (CLFE) module to generate high-quality saliency map. Extensive experiments are conducted on three publicly available RGB-T SOD datasets shows that the proposed CMDBIF-Net achieves outstanding performance against the state-of-the-art (SOTA) RGB-T SOD methods.
engineering, electrical & electronic
What problem does this paper attempt to address?