Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection

Gang Chen,Feng Shao,Xiongli Chai,Hangwei Chen,Qiuping Jiang,Xiangchao Meng,Yo-Sung Ho
DOI: https://doi.org/10.1109/tcsvt.2022.3215979
IF: 5.859
2023-04-08
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:The ability of capturing the complementary information of multi-modality data is critical to the development of multi-modality salient object detection (SOD). Most of existing studies attempt to integrate multi-modality information through various fusion strategies. However, most of these methods ignore the inherent differences in multi-modality data, resulting in poor performance when dealing with some challenging scenarios. In this paper, we propose a novel Modality-Induced Transfer-Fusion Network (MITF-Net) for RGB-D and RGB-T SOD by fully exploring the complementarity in multi-modality data. Specifically, we first deploy a modality transfer fusion (MTF) module to bridge the semantic gap between single and multi-modality data, and then mine the cross-modality complementarity based on point-to-point structural similarity information. Then, we design a cycle-separated attention (CSA) module to optimize the cross-layer information recurrently, and measure the effectiveness of cross-layer features through point-wise convolution-based multi-scale channel attention. Furthermore, we refine the boundaries in the decoding stage to obtain high-quality saliency maps with sharp boundaries. Extensive experiments on 13 RGB-D and RGB-T SOD datasets show that the proposed MITF-Net achieves a competitive and excellent performance.
engineering, electrical & electronic
What problem does this paper attempt to address?