Video Salient Object Detection Via Spatiotemporal Co-Attention and Global Structural Dependence

Bing Liu,Tiantian Wang,Lina Gao,Zheng Yan,Mingzhu Xu
DOI: https://doi.org/10.2139/ssrn.4342059
2023-01-01
SSRN Electronic Journal
Abstract:At present, many video salient object detection methods based on deep convolutional neural networks still suffer from coarse object edges, poor detection integrity and wrongly highlighted background, which may be attributed to the loss of high-frequency information and the inadequacy of global structure information. To solve these problems, we find a new way based on graph neural network learning methods in non-Euclidean space. Specifically, we propose a novel video salient object detection method based on spatiotemporal co-attention and global structural dependence. First, in order to learn high-quality salient spatial features, we design a bidirectional progressive salient object spatial feature extraction module (Bi-PSSFEM), which fuses multi-scale spatial information in a bidirectional and recursive manner to fully capture important salient spatial information. Second, we design a novel ConvLSTM module embedded with the co-attention mechanism (CoAConvLSTM), which improves the continuous consistency of salient objects in the spatiotemporal domain by considering the spatiotemporal correlation between adjacent video frames. Furthermore, in order to identify salient objects of different scales in dynamic visual scenes, we adopt our proposed CoAConvLSTMs with different dilated rates to form the ASPP-CoAConvLSTM in parallel, so as to capture spatiotemporal information of rich spatial receptive fields. Finally, a spatiotemporal graph convolution network (STGCN) is designed to fully mine the spatiotemporal context structure information in non-Euclidian space, so as to further improve the network model’s ability to obtain salient spatiotemporal clues. Experiments on four public benchmark datasets show that our proposed method is superior to the current mainstream video salient object detection methods based on deep convolutional neural networks.
What problem does this paper attempt to address?