Abstract:Deep convolutional neural networks (CNNs) have gained prominence in computer vision applications, including RGB salient object detection (SOD), owing to the advancements in deep learning. Nevertheless, the majority of deep CNNs employ either VGGNet or ResNet as their backbone architecture for extracting image information. This approach may lead to the following problems. 1) Variations between imaging modalities during feature extraction across layers. Cross-modal features across layers are often fused in a single step, resulting in inadequate cross-modal feature extraction. 2) Feature long-range dependence problem in multilayer feature decoding. 3) Image boundary blurring. To address these issues, we initially leverage the advantages offered by the VGGNet and ResNet architectures. Additionally, we present a novel hybrid VGG–ResNet feature encoder for RGB-T SOD. Specifically, we introduce a geometry information aggregation module that effectively combines and enhances the VGGNet spatial features of the RGB-T modalities from the bottom to the top. Moreover, we propose a innovative global saliency perception module that progressively refines the ResNet semantic features from the top to the bottom by integrating both local and global information. Furthermore, we introduce a Pearson-gated module to tackle the challenge of long-range dependence between features. This module utilizes gating to merge features by calculating the Pearson correlation coefficients of the fused features at multiple levels. Lastly, we devise an edge-aware module to precisely learn the contours of salient objects, thereby enhancing the clarity of the object boundaries. Extensive experiments conducted on three RGB-T SOD benchmarks demonstrate that our proposed network surpasses the performance of state-of-the-art methods for SOD.

HDNet: Multi-Modality Hierarchy-Aware Decision Network for RGB-D Salient Object Detection

HODINet: High-Order Discrepant Interaction Network for RGB-D Salient Object Detection

Compensated Attention Feature Fusion and Hierarchical Multiplication Decoder Network for RGB-D Salient Object Detection

HFMDNet: Hierarchical Fusion and Multilevel Decoder Network for RGB-D Salient Object Detection

Hierarchical Dynamic Filtering Network for RGB-D Salient Object Detection

HFENet: Hybrid feature encoder network for detecting salient objects in RGB-thermal images

Cross-Modal Fusion and Progressive Decoding Network for RGB-D Salient Object Detection

HiDAnet: RGB-D Salient Object Detection via Hierarchical Depth Awareness

CFIDNet: cascaded feature interaction decoder for RGB-D salient object detection

Dynamic Selective Network for RGB-D Salient Object Detection

Depth Cue Enhancement and Guidance Network for RGB-D Salient Object Detection

Hybrid Attention Mechanism and Forward Feedback Unit for RGB-D Salient Object Detection

A Unified Structure for Efficient RGB and RGB-D Salient Object Detection

MFCINet: multi-level feature and context information fusion network for RGB-D salient object detection

Cross-modal refined adjacent-guided network for RGB-D salient object detection

Cross-modality Discrepant Interaction Network for RGB-D Salient Object Detection

Attention-guided cross-modal multiple feature aggregation network for RGB-D salient object detection

MMNet: Multi-Stage and Multi-Scale Fusion Network for RGB-D Salient Object Detection

TANet: Transformer-based Asymmetric Network for RGB-D Salient Object Detection

Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection

ECW-EGNet: Exploring Cross-ModalWeighting and edge-guided decoder network for RGB-D salient object detection