SalFAU-Net: Saliency Fusion Attention U-Net for Salient Object Detection

Kassaw Abraham Mulat,Zhengyong Feng,Tegegne Solomon Eshetie,Ahmed Endris Hasen
2024-05-05
Abstract:Salient object detection (SOD) remains an important task in computer vision, with applications ranging from image segmentation to autonomous driving. Fully convolutional network (FCN)-based methods have made remarkable progress in visual saliency detection over the last few decades. However, these methods have limitations in accurately detecting salient objects, particularly in challenging scenes with multiple objects, small objects, or objects with low resolutions. To address this issue, we proposed a Saliency Fusion Attention U-Net (SalFAU-Net) model, which incorporates a saliency fusion module into each decoder block of the attention U-net model to generate saliency probability maps from each decoder block. SalFAU-Net employs an attention mechanism to selectively focus on the most informative regions of an image and suppress non-salient regions. We train SalFAU-Net on the DUTS dataset using a binary cross-entropy loss function. We conducted experiments on six popular SOD evaluation datasets to evaluate the effectiveness of the proposed method. The experimental results demonstrate that our method, SalFAU-Net, achieves competitive performance compared to other methods in terms of mean absolute error (MAE), F-measure, s-measure, and e-measure.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is **the challenges in Salient Object Detection (SOD)**, especially in cases of complex scenes, multiple objects, small objects or low - resolution objects, where existing methods have difficulty in accurately detecting salient objects. Specifically: 1. **Salient object detection in complex scenes**: Traditional methods and early deep - learning methods perform poorly when dealing with complex backgrounds or multi - object scenes, and are prone to false positives or false negatives. 2. **Small object detection**: For smaller objects, existing Fully Convolutional Network (FCN) methods often fail to accurately capture these objects, resulting in unsatisfactory detection results. 3. **Low - resolution object detection**: When the image resolution is low, existing methods have difficulty in extracting sufficient features to accurately identify salient objects. To solve these problems, the author proposes a new model named **SalFAU - Net**, which combines an attention mechanism and a Saliency Fusion Module (SFM) to improve the accuracy of salient object detection. The following are the main improvements of this model: - **Introduction of the Saliency Fusion Module (SFM)**: Add a Saliency Fusion Module in each decoder block to generate the saliency probability map of each decoder block, and fuse these maps to obtain the final saliency map. This helps to more accurately capture saliency features of different scales and shapes. - **Attention mechanism**: Through the Attention Gate module, the model can selectively focus on the most important regions in the image while suppressing non - salient regions, thereby improving the detection accuracy. - **Multi - level feature fusion**: Utilize multi - level features and context information to enhance the model's adaptability to complex scenes. ### Formula Summary 1. **Attention coefficient calculation**: \[ q_{\text{att}}^l=\Psi(\sigma_1(W_q^T\times q_i^l + W_k^T\times K_i + b_k))+b_\Psi \] \[ \alpha_i^l=\sigma_2(q_{\text{att}}(q_i^l, K_i; \Theta_{\text{att}})) \] \[ \hat{x}_{i,c}^l=\alpha_i^l\cdot K_{i,c} \] where $\sigma_2(x_i, c)=\frac{1}{1 + \exp(-x_{i,c})}$ is the sigmoid activation function. 2. **Saliency probability map generation**: \[ S_{\text{side}}^{(i)}=\sigma(\text{Conv}^{(i)}(X)) \] \[ S_{\text{fuse}}=\sigma(\text{Conv}_{\text{fuse}}(\text{Concat}(S_{\text{side}}^{(i)}))) \] 3. **Loss function**: \[ L=\sum_{m = 1}^M w_{\text{side}}^m l_{\text{side}}^m+w_{\text{fuse}}l_{\text{fuse}} \] \[ l =-\sum_{(x,y)}[G(x,y)\log P(x,y)+(1 - P(x,y))\log(1 - P(x,y))] \] Through these improvements, the experimental results of SalFAU - Net on multiple public datasets show that it is competitive in the salient object detection task, especially in complex scenes and small object detection.