Deep spatio-frequency saliency detection

Zun Li,Congyan Lang,Tao Wang,Yidong Li,Jiashi Feng
DOI: https://doi.org/10.1016/j.neucom.2020.05.109
IF: 6
2021-09-01
Neurocomputing
Abstract:<p>Despite the wide success in many vision tasks, it is still challenging for Convolutional Neural Networks (CNNs) to perform saliency detection due to their limited receptive fields and lack of enough discriminative contexts until very late layers. In this paper, beyond spatial convolution, we propose a Spatio-Frequency Network (SFNet) that exploits spatio-frequency clues to effectively enlarge the receptive fields of CNN layers and more importantly, strengthen their spatial discrimination for better saliency detection. In particular, the proposed SFNet contains a carefully designed Frequency Residual Module (FRM) that captures the holistic representation of the whole image within the frequency domain. The FRM leverages discrete and inverse discrete wavelet transformation to alternatively transfer global spatial features into frequency domains, to assist fast and accurate salient object detection. Besides, SFNet also includes an Aggregation of Frequency and Spatial Feature (AFSF) module to jointly integrate the two domain features guided by saliency results in a top-down manner. In this way, the aggregation features per layer contain rich holistic contexts, and the network can eventually explore more complete salient object parts and details by progressively integrating saliency predictions. Extensive experiments on six widely-used saliency detection datasets clearly demonstrate the advantages of our proposed model compared with state-of-the-art.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges faced by convolutional neural networks (CNNs) in saliency detection in computer vision tasks. Specifically, the paper points out that although CNNs have achieved extensive success in many visual tasks, there are still two main limitations in saliency detection: 1. **Limited receptive field**: CNNs in the early layers have a limited receptive field, so they cannot capture enough discriminative context until very deep layers. This makes it difficult to quickly capture the overall information of the image. 2. **Low - resolution high - level features**: The resolution of high - level features is usually too low to generate saliency detection results with clear details and boundaries. To overcome these limitations, the paper proposes a new deep - learning - based saliency detection model - the Spatio - Frequency Network (SFNet). This model effectively enlarges the receptive field by converting spatial CNN features into the frequency domain, and enhances the spatial discrimination ability by jointly using frequency - domain and spatial - domain features, thereby achieving more refined saliency detection. ### Main contributions 1. **Proposed a new spatio - frequency CNN network**: This network can mine cues in both the spatial domain and the frequency domain simultaneously, which is an innovation in the field of deep saliency detection. 2. **Designed the Frequency Residual Module (FRM)**: This module can capture the overall representation of the entire image and has a large receptive field. 3. **Proposed the Aggregation of Frequency and Spatial Feature (AFSF)**: This module can jointly integrate frequency - enhanced features and semantically rich spatial CNN features in each convolutional layer, guiding the saliency results to be gradually integrated from the top layer to the bottom layer. Through these innovations, the model proposed in the paper can effectively capture the entire salient object and provide sharp saliency detection results, thus showing advantages over existing methods on multiple widely - used saliency detection datasets.