Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection

Chengtao Lv,Bin Wan,Xiaofei Zhou,Yaoqi Sun,Jiyong Zhang,Chenggang Yan
DOI: https://doi.org/10.3390/e26020130
IF: 2.738
2024-02-01
Entropy
Abstract:RGB-T salient object detection (SOD) has made significant progress in recent years. However, most existing works are based on heavy models, which are not applicable to mobile devices. Additionally, there is still room for improvement in the design of cross-modal feature fusion and cross-level feature fusion. To address these issues, we propose a lightweight cross-modal information mutual reinforcement network for RGB-T SOD. Our network consists of a lightweight encoder, the cross-modal information mutual reinforcement (CMIMR) module, and the semantic-information-guided fusion (SIGF) module. To reduce the computational cost and the number of parameters, we employ the lightweight module in both the encoder and decoder. Furthermore, to fuse the complementary information between two-modal features, we design the CMIMR module to enhance the two-modal features. This module effectively refines the two-modal features by absorbing previous-level semantic information and inter-modal complementary information. In addition, to fuse the cross-level feature and detect multiscale salient objects, we design the SIGF module, which effectively suppresses the background noisy information in low-level features and extracts multiscale information. We conduct extensive experiments on three RGB-T datasets, and our method achieves competitive performance compared to the other 15 state-of-the-art methods.
physics, multidisciplinary
What problem does this paper attempt to address?
The paper attempts to address two main issues: 1. **Existing models are too heavy**: Most existing RGB-T saliency object detection (SOD) methods are based on complex, computationally expensive models that are not suitable for running on mobile devices. 2. **Insufficient design of cross-modal feature fusion and cross-level feature fusion**: There is still room for improvement in the design of cross-modal feature fusion and cross-level feature fusion in existing methods, especially in how to effectively integrate complementary information between the two modalities. To address these issues, the authors propose a Lightweight Cross-Modal Information Mutual Reinforcement Network, aiming to achieve efficient RGB-T saliency object detection suitable for mobile devices. Specifically, the network includes the following key components: - **Lightweight Encoder**: Uses MobileNet-V2 as the backbone network to reduce the number of parameters and computational cost. - **Cross-Modal Information Mutual Reinforcement Module (CMIMR)**: Enhances the features of both modalities by absorbing semantic information and complementary information between modalities from the previous level of decoding features. - **Semantic Information Guided Fusion Module (SIGF)**: Fuses cross-level features during the decoding stage, effectively suppressing background noise information in low-level features and extracting multi-scale information. Through these designs, the authors hope to make the model more lightweight and more suitable for running on resource-constrained devices while maintaining high performance.