Abstract:<p>Despite the wide success in many vision tasks, it is still challenging for Convolutional Neural Networks (CNNs) to perform saliency detection due to their limited receptive fields and lack of enough discriminative contexts until very late layers. In this paper, beyond spatial convolution, we propose a Spatio-Frequency Network (SFNet) that exploits spatio-frequency clues to effectively enlarge the receptive fields of CNN layers and more importantly, strengthen their spatial discrimination for better saliency detection. In particular, the proposed SFNet contains a carefully designed Frequency Residual Module (FRM) that captures the holistic representation of the whole image within the frequency domain. The FRM leverages discrete and inverse discrete wavelet transformation to alternatively transfer global spatial features into frequency domains, to assist fast and accurate salient object detection. Besides, SFNet also includes an Aggregation of Frequency and Spatial Feature (AFSF) module to jointly integrate the two domain features guided by saliency results in a top-down manner. In this way, the aggregation features per layer contain rich holistic contexts, and the network can eventually explore more complete salient object parts and details by progressively integrating saliency predictions. Extensive experiments on six widely-used saliency detection datasets clearly demonstrate the advantages of our proposed model compared with state-of-the-art.</p>

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges faced by convolutional neural networks (CNNs) in saliency detection in computer vision tasks. Specifically, the paper points out that although CNNs have achieved extensive success in many visual tasks, there are still two main limitations in saliency detection: 1. **Limited receptive field**: CNNs in the early layers have a limited receptive field, so they cannot capture enough discriminative context until very deep layers. This makes it difficult to quickly capture the overall information of the image. 2. **Low - resolution high - level features**: The resolution of high - level features is usually too low to generate saliency detection results with clear details and boundaries. To overcome these limitations, the paper proposes a new deep - learning - based saliency detection model - the Spatio - Frequency Network (SFNet). This model effectively enlarges the receptive field by converting spatial CNN features into the frequency domain, and enhances the spatial discrimination ability by jointly using frequency - domain and spatial - domain features, thereby achieving more refined saliency detection. ### Main contributions 1. **Proposed a new spatio - frequency CNN network**: This network can mine cues in both the spatial domain and the frequency domain simultaneously, which is an innovation in the field of deep saliency detection. 2. **Designed the Frequency Residual Module (FRM)**: This module can capture the overall representation of the entire image and has a large receptive field. 3. **Proposed the Aggregation of Frequency and Spatial Feature (AFSF)**: This module can jointly integrate frequency - enhanced features and semantically rich spatial CNN features in each convolutional layer, guiding the saliency results to be gradually integrated from the top layer to the bottom layer. Through these innovations, the model proposed in the paper can effectively capture the entire salient object and provide sharp saliency detection results, thus showing advantages over existing methods on multiple widely - used saliency detection datasets.

Deep spatio-frequency saliency detection

Spatial Frequency Enhanced Salient Object Detection

CSA-Net: Deep Cross-Complementary Self Attention and Modality-Specific Preservation for Saliency Detection

Attentive feature integration network for detecting salient objects in images

Accurate salient object detection via dense recurrent connections and residual-based hierarchical feature integration.

Deep supervised visual saliency model addressing low-level features

DeepSaliency : MultiTask Deep Neural Network Model for Salient Object Detection

Self-Attention Recurrent Network for Saliency Detection

Enriched Feature Representation and Combination for Deep Saliency Detection

Holistic and Deep Feature Pyramids for Saliency Detection.

Saliency Detection Based on Multiple-Level Feature Learning

AWANet: Attentive-Aware Wide-Kernels Asymmetrical Network with Blended Contour Information for Salient Object Detection

A Deep Spatial Contextual Long-term Recurrent Convolutional Network for Saliency Detection

Residual attentive feature learning network for salient object detection

Co-Saliency Detection With Co-Attention Fully Convolutional Network

End-to-End Video Saliency Detection Via a Deep Contextual Spatiotemporal Network

Deep Contrast Learning for Salient Object Detection

SAC-Net: Spatial Attenuation Context for Salient Object Detection

Saliency Detection Within a Deep Convolutional Architecture

Contrast-Oriented Deep Neural Networks for Salient Object Detection

AMDFNet: Adaptive multi-level deformable fusion network for RGB-D saliency detection