Abstract:With the prevalence of thermal cameras, RGB-T multi-modal data have become more available for salient object detection (SOD) in complex scenes. Most RGB-T SOD works first individually extract RGB and thermal features from two separate encoders and directly integrate them, which pay less attention to the issue of defective modalities. However, such an indiscriminate feature extraction strategy may produce contaminated features and thus lead to poor SOD performance. To address this issue, we propose a novel CCFENet for a perspective to perform robust and accurate multi-modal expression encoding. First, we propose an essential cross-collaboration enhancement strategy (CCE), which concentrates on facilitating the interactions across the encoders and encouraging different modalities to complement each other during encoding. Such a cross-collaborative-encoder paradigm induces our network to collaboratively suppress the negative feature responses of defective modality data and effectively exploit modality-informative features. Moreover, as the network goes deeper, we embed several CCEs into the encoder, further enabling more representative and robust feature generation. Second, benefiting from the proposed robust encoding paradigm, a simple yet effective cross-scale cross-modal decoder (CCD) is designed to aggregate multi-level complementary multi-modal features, and thus encourages efficient and accurate RGB-T SOD. Extensive experiments reveal that our CCFENet outperforms the state-of-the-art models on three RGB-T datasets with a fast inference speed of 62 FPS. In addition, the advantages of our approach in complex scenarios (e.g., bad weather, motion blur, etc.) and RGB-D SOD further verify its robustness and generality. The source code will be publicly available via our project page: https://git.openi.org.cn/OpenVision/CCFENet .

Highly Efficient RGB-D Salient Object Detection with Adaptive Fusion and Attention Regulation

Hybrid Attention Mechanism and Forward Feedback Unit for RGB-D Salient Object Detection

Compensated Attention Feature Fusion and Hierarchical Multiplication Decoder Network for RGB-D Salient Object Detection

An adaptive guidance fusion network for RGB-D salient object detection

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

TANet: Transformer-based Asymmetric Network for RGB-D Salient Object Detection

HODINet: High-Order Discrepant Interaction Network for RGB-D Salient Object Detection

HFMDNet: Hierarchical Fusion and Multilevel Decoder Network for RGB-D Salient Object Detection

A Single Stream Network for Robust and Real-Time RGB-D Salient Object Detection

Multi-modality information refinement fusion network for RGB-D salient object detection

Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection

Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection

Dynamic Selective Network for RGB-D Salient Object Detection

ECW-EGNet: Exploring Cross-ModalWeighting and edge-guided decoder network for RGB-D salient object detection

HFENet: Hybrid feature encoder network for detecting salient objects in RGB-thermal images

CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection

Unified Information Fusion Network for Multi-Modal RGB-D and RGB-T Salient Object Detection

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

MFCINet: multi-level feature and context information fusion network for RGB-D salient object detection

Cross-Collaborative Fusion-Encoder Network for Robust RGB-Thermal Salient Object Detection.

Middle-Level Feature Fusion for Lightweight RGB-D Salient Object Detection