Modality-Aware Adaptive-Integration Guided Single-Stream Network for RGB-T Saliency Detection

Yu Pang,Longkun Zhang,Yang Huang,Xiaosheng Yu
DOI: https://doi.org/10.21203/rs.3.rs-5440861/v1
2024-01-01
Abstract:RGB-T saliency detection becomes gradually a hot topic in saliency detection field recently. However, existing works (especially CNN based methods) usually use the two-stream structure to separately extract saliency cues from RGB and thermal infrared images, and then integrate them into the final detection result, this strategy greatly increases parameters scale while multi-modal fusion results are also very sensitive to two modalities’ quality. Based on above observation, we develop a novel Modality-Aware Adaptive-Integration Guided Single-Stream Network(MAANet), to detect salient objects from RGB-T image pairs. The feature pyramid network(FPN) is adopted as the basic structure of our MAANet. In order to tactfully fuse two supplementary modalities: (1)In the encoder: RGB and thermal infrared images are concatenated into 4-channel input of encoder structure in the proposed MAANet. (2)In the decoder: We propose a novel Modality-Aware Adaptive-Integration based Attention mechanism (MAAM) to enable the decoder to optimally perform the fusion of two modalities, and produce more accurate saliency predictions. (3)Finally: A novel coarse-and-refined bidirectional optimization(CRBO) method is proposed to suppress irrelevant background regions of saliency map generated by decoder structure. The proposed MAANet could better take both advantages of two modalities and is not sensitive to any one modality compared to previous RGB-T methods, meanwhile, MAANet is also more lightweight than previous works.Extensive experiments demonstrate that the proposed model performs favorably against most state-of-the-art RGB-T methods under different evaluation metrics, even outperforms than most RGB and RGB-D methods.
What problem does this paper attempt to address?