Abstract:Salient object detection (SOD) remains an important task in computer vision, with applications ranging from image segmentation to autonomous driving. Fully convolutional network (FCN)-based methods have made remarkable progress in visual saliency detection over the last few decades. However, these methods have limitations in accurately detecting salient objects, particularly in challenging scenes with multiple objects, small objects, or objects with low resolutions. To address this issue, we proposed a Saliency Fusion Attention U-Net (SalFAU-Net) model, which incorporates a saliency fusion module into each decoder block of the attention U-net model to generate saliency probability maps from each decoder block. SalFAU-Net employs an attention mechanism to selectively focus on the most informative regions of an image and suppress non-salient regions. We train SalFAU-Net on the DUTS dataset using a binary cross-entropy loss function. We conducted experiments on six popular SOD evaluation datasets to evaluate the effectiveness of the proposed method. The experimental results demonstrate that our method, SalFAU-Net, achieves competitive performance compared to other methods in terms of mean absolute error (MAE), F-measure, s-measure, and e-measure.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is **the challenges in Salient Object Detection (SOD)**, especially in cases of complex scenes, multiple objects, small objects or low - resolution objects, where existing methods have difficulty in accurately detecting salient objects. Specifically: 1. **Salient object detection in complex scenes**: Traditional methods and early deep - learning methods perform poorly when dealing with complex backgrounds or multi - object scenes, and are prone to false positives or false negatives. 2. **Small object detection**: For smaller objects, existing Fully Convolutional Network (FCN) methods often fail to accurately capture these objects, resulting in unsatisfactory detection results. 3. **Low - resolution object detection**: When the image resolution is low, existing methods have difficulty in extracting sufficient features to accurately identify salient objects. To solve these problems, the author proposes a new model named **SalFAU - Net**, which combines an attention mechanism and a Saliency Fusion Module (SFM) to improve the accuracy of salient object detection. The following are the main improvements of this model: - **Introduction of the Saliency Fusion Module (SFM)**: Add a Saliency Fusion Module in each decoder block to generate the saliency probability map of each decoder block, and fuse these maps to obtain the final saliency map. This helps to more accurately capture saliency features of different scales and shapes. - **Attention mechanism**: Through the Attention Gate module, the model can selectively focus on the most important regions in the image while suppressing non - salient regions, thereby improving the detection accuracy. - **Multi - level feature fusion**: Utilize multi - level features and context information to enhance the model's adaptability to complex scenes. ### Formula Summary 1. **Attention coefficient calculation**: \[ q_{\text{att}}^l=\Psi(\sigma_1(W_q^T\times q_i^l + W_k^T\times K_i + b_k))+b_\Psi \] \[ \alpha_i^l=\sigma_2(q_{\text{att}}(q_i^l, K_i; \Theta_{\text{att}})) \] \[ \hat{x}_{i,c}^l=\alpha_i^l\cdot K_{i,c} \] where $\sigma_2(x_i, c)=\frac{1}{1 + \exp(-x_{i,c})}$ is the sigmoid activation function. 2. **Saliency probability map generation**: \[ S_{\text{side}}^{(i)}=\sigma(\text{Conv}^{(i)}(X)) \] \[ S_{\text{fuse}}=\sigma(\text{Conv}_{\text{fuse}}(\text{Concat}(S_{\text{side}}^{(i)}))) \] 3. **Loss function**: \[ L=\sum_{m = 1}^M w_{\text{side}}^m l_{\text{side}}^m+w_{\text{fuse}}l_{\text{fuse}} \] \[ l =-\sum_{(x,y)}[G(x,y)\log P(x,y)+(1 - P(x,y))\log(1 - P(x,y))] \] Through these improvements, the experimental results of SalFAU - Net on multiple public datasets show that it is competitive in the salient object detection task, especially in complex scenes and small object detection.

SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection

Dual-Branch Feature Fusion Network for Salient Object Detection

MAFNet: Multi-style attention fusion network for salient object detection

AWANet: Attentive-Aware Wide-Kernels Asymmetrical Network with Blended Contour Information for Salient Object Detection

Salient Object Detection Based on Visual Perceptual Saturation and Two-Stream Hybrid Networks.

Complementary characteristics fusion network for weakly supervised salient object detection

Improved U-Net-Like Network for Visual Saliency Detection Based on Pyramid Feature Attention

Salient Object Detection via Bilateral Feature Fusion and Score Sorting Attention Mechanism

Compensated Attention Feature Fusion and Hierarchical Multiplication Decoder Network for RGB-D Salient Object Detection

Hybrid Attention Mechanism and Forward Feedback Unit for RGB-D Salient Object Detection

SLMSF-Net: A Semantic Localization and Multi-Scale Fusion Network for RGB-D Salient Object Detection

Attentive feature integration network for detecting salient objects in images

DeepSaliency : MultiTask Deep Neural Network Model for Salient Object Detection

UDNet: Uncertainty-aware Deep Network for Salient Object Detection

AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection

Co-Saliency Detection With Co-Attention Fully Convolutional Network

Dynamic Selective Network for RGB-D Salient Object Detection

SODAWideNet++: Combining Attention and Convolutions for Salient Object Detection

MFC-Net : Multi-feature fusion cross neural network for salient object detection

Multi-Color Space Network for Salient Object Detection

Semantic feature-guided and correlation-aggregated salient object detection