Abstract:In recent years, Transformers have been gradually applied in salient object detection tasks with good results. However, the Transformer’s global modeling capabilities can lead to the loss of local details that are important in salient object detection tasks. A feature extraction backbone based on a convolutional neural network (CNN) is good at extracting local detail features due to the gradual expansion of the receptive field but is limited by the size of the receptive field, resulting in an insufficient ability to extract global semantic features. Therefore, this paper combines the Transformer with a CNN and presents a dual-branch encoder to ensure that the features extracted contain rich global semantic information as well as local detail features. In addition, due to the different features extracted by the Transformer and CNN, noise may be introduced in the fusion of the two features, so different features need to be processed correspondingly during fusion. The fusion enhancement module (FEM) we propose fuses the features of the two branches step by step. A hybrid attention mechanism is used to carry out weighted fusion of different features. This progressive approach minimizes the differences between the features of the two branches so that the merged features retain the semantic and detail features extracted by the two branches to the greatest extent. Considering the loss of detailed information caused by repeated downsampling, we propose an edge refinement module (ERM) to address the need for accurate outline prediction. This module leverages salient features to obtain edge features and gradually refines the prediction results by incorporating these edge features. It makes full use of the connection between salient features and edge features and does not introduce additional edges to extract branches. Extensive experimental evaluations conducted on five benchmark tests demonstrate the superior performance of our method compared to other existing approaches. Code can be found at https://github.com/gfq1605694825/DSRNet-main.

PileNet: A high-and-low pass complementary filter with multi-level feature refinement for salient object detection

M$^3$Net: Multilevel, Mixed and Multistage Attention Network for Salient Object Detection

MPTC-FPN: A Multilayer Progressive FPN With Transformer-CNN Based Encoder for Salient Object Detection

Receptive Field Broadening and Boosting for Salient Object Detection

Multi-branch feature fusion and refinement network for salient object detection

Unifying convolution and transformer: a dual stage network equipped with cross-interactive multi-modal feature fusion and edge guidance for RGB-D salient object detection

Transformers and CNNs Fusion Network for Salient Object Detection.

Salient object detection with dual-branch stepwise feature fusion and edge refinement

Suppress and Balance: A Simple Gated Network for Salient Object Detection

Mucormycosis in a rhesus monkey.

Salient Object Detection Based on Visual Perceptual Saturation and Two-Stream Hybrid Networks.

AWANet: Attentive-Aware Wide-Kernels Asymmetrical Network with Blended Contour Information for Salient Object Detection

PATNet: Patch-to-pixel attention-aware transformer network for RGB-D and RGB-T salient object detection

A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection

Boosting Salient Object Detection with Transformer-based Asymmetric Bilateral U-Net

Unidirectional RGB-T salient object detection with intertwined driving of encoding and fusion

MAFNet: Multi-style attention fusion network for salient object detection

Point-aware Interaction and CNN-induced Refinement Network for RGB-D Salient Object Detection

DeepSaliency : MultiTask Deep Neural Network Model for Salient Object Detection

Cross-Layer Feature Pyramid Network for Salient Object Detection

Unifying Global-Local Representations in Salient Object Detection with Transformer