GFNet: Gated Fusion Network for Video Saliency Prediction

Songhe Wu,Xiaofei Zhou,Yaoqi Sun,Yuhan Gao,Zunjie Zhu,Jiyong Zhang,Chenggang Yan
DOI: https://doi.org/10.1007/s10489-023-04861-5
IF: 5.3
2023-01-01
Applied Intelligence
Abstract:Most of the cutting-edge 3D convolutional-based video saliency prediction models adopt a fully convolutional encoder-decoder architecture, which provides a good spatiotemporal representation of salient regions. However, the spatiotemporal cues of the saliency models will be continuously diluted during the decoding process, leading to a decreased ability to locate salient regions. To address this limitation, we propose a simple and effective gated fusion network (GFNet) to conduct video saliency prediction. Specifically, GFNet is built on a fully 3D convolutional encoder-decoder architecture and consists of a key component named the gated fusion (GF) module, which acts as a message screening unit between each level encoder and decoder features. In the GF module, the gate can be obtained via the combination of features from the previous decoder block and the current encoder block, which is employed to weight the encoder features. In this way, GFNet can control the message passing between each level encoder and decoder block. Extensive experimental results on four video saliency datasets show that our method achieves comparable performance against state-of-the-art models. The code is available at https://github.com/wusonghe/GFNet .
What problem does this paper attempt to address?