Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection

Chengtao Lv,Bin Wan,Xiaofei Zhou,Yaoqi Sun,Jiyong Zhang,Chenggang Yan

DOI: https://doi.org/10.3390/e26020130

IF: 2.738

2024-02-01

Entropy

Abstract:RGB-T salient object detection (SOD) has made significant progress in recent years. However, most existing works are based on heavy models, which are not applicable to mobile devices. Additionally, there is still room for improvement in the design of cross-modal feature fusion and cross-level feature fusion. To address these issues, we propose a lightweight cross-modal information mutual reinforcement network for RGB-T SOD. Our network consists of a lightweight encoder, the cross-modal information mutual reinforcement (CMIMR) module, and the semantic-information-guided fusion (SIGF) module. To reduce the computational cost and the number of parameters, we employ the lightweight module in both the encoder and decoder. Furthermore, to fuse the complementary information between two-modal features, we design the CMIMR module to enhance the two-modal features. This module effectively refines the two-modal features by absorbing previous-level semantic information and inter-modal complementary information. In addition, to fuse the cross-level feature and detect multiscale salient objects, we design the SIGF module, which effectively suppresses the background noisy information in low-level features and extracts multiscale information. We conduct extensive experiments on three RGB-T datasets, and our method achieves competitive performance compared to the other 15 state-of-the-art methods.

physics, multidisciplinary

What problem does this paper attempt to address?

The paper attempts to address two main issues: 1. **Existing models are too heavy**: Most existing RGB-T saliency object detection (SOD) methods are based on complex, computationally expensive models that are not suitable for running on mobile devices. 2. **Insufficient design of cross-modal feature fusion and cross-level feature fusion**: There is still room for improvement in the design of cross-modal feature fusion and cross-level feature fusion in existing methods, especially in how to effectively integrate complementary information between the two modalities. To address these issues, the authors propose a Lightweight Cross-Modal Information Mutual Reinforcement Network, aiming to achieve efficient RGB-T saliency object detection suitable for mobile devices. Specifically, the network includes the following key components: - **Lightweight Encoder**: Uses MobileNet-V2 as the backbone network to reduce the number of parameters and computational cost. - **Cross-Modal Information Mutual Reinforcement Module (CMIMR)**: Enhances the features of both modalities by absorbing semantic information and complementary information between modalities from the previous level of decoding features. - **Semantic Information Guided Fusion Module (SIGF)**: Fuses cross-level features during the decoding stage, effectively suppressing background noise information in low-level features and extracting multi-scale information. Through these designs, the authors hope to make the model more lightweight and more suitable for running on resource-constrained devices while maintaining high performance.

Lightweight Cross-Modal Information Mutual Reinforcement Network for RGB-T Salient Object Detection

Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

Lightweight Multi-modal Representation Learning for RGB Salient Object Detection

Cross-Modality Double Bidirectional Interaction and Fusion Network for RGB-T Salient Object Detection

Interactive Context-Aware Network for RGB-T Salient Object Detection

RGB-D Salient Object Detection with Cross-Modality Modulation and Selection

Modality-Induced Transfer-Fusion Network for RGB-D and RGB-T Salient Object Detection

Enabling modality interactions for RGB-T salient object detection

Cross-Modal Weighting Network for RGB-D Salient Object Detection

Middle-Level Feature Fusion for Lightweight RGB-D Salient Object Detection

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

Efficient Context-Guided Stacked Refinement Network for RGB-T Salient Object Detection

Middle-level Fusion for Lightweight RGB-D Salient Object Detection

Cross-modal and multi-level feature refinement network for RGB-D salient object detection

Multi-level cross-modal interaction network for RGB-D salient object detection

MFCINet: multi-level feature and context information fusion network for RGB-D salient object detection

MSEDNet: Multi-scale fusion and edge-supervised network for RGB-T salient object detection

CGFNet: Cross-Guided Fusion Network for RGB-T Salient Object Detection

Cross-modal refined adjacent-guided network for RGB-D salient object detection

RGBD Salient Object Detection via Disentangled Cross-modal Fusion

Multi-modality information refinement fusion network for RGB-D salient object detection