Abstract:At present, many video salient object detection methods based on deep convolutional neural networks still suffer from coarse object edges, poor detection integrity and wrongly highlighted background, which may be attributed to the loss of high-frequency information and the inadequacy of global structure information. To solve these problems, we find a new way based on graph neural network learning methods in non-Euclidean space. Specifically, we propose a novel video salient object detection method based on spatiotemporal co-attention and global structural dependence. First, in order to learn high-quality salient spatial features, we design a bidirectional progressive salient object spatial feature extraction module (Bi-PSSFEM), which fuses multi-scale spatial information in a bidirectional and recursive manner to fully capture important salient spatial information. Second, we design a novel ConvLSTM module embedded with the co-attention mechanism (CoAConvLSTM), which improves the continuous consistency of salient objects in the spatiotemporal domain by considering the spatiotemporal correlation between adjacent video frames. Furthermore, in order to identify salient objects of different scales in dynamic visual scenes, we adopt our proposed CoAConvLSTMs with different dilated rates to form the ASPP-CoAConvLSTM in parallel, so as to capture spatiotemporal information of rich spatial receptive fields. Finally, a spatiotemporal graph convolution network (STGCN) is designed to fully mine the spatiotemporal context structure information in non-Euclidian space, so as to further improve the network modelâ€™s ability to obtain salient spatiotemporal clues. Experiments on four public benchmark datasets show that our proposed method is superior to the current mainstream video salient object detection methods based on deep convolutional neural networks.

Video Salient Object Detection Via Spatiotemporal Co-Attention and Global Structural Dependence

[Polymorphisms of five short tandem repeat systems in Chinese Han population in Chengdu]

Multilevel Spatial-Temporal Feature Aggregation for Video Object Detection

Video Salient Object Detection via Contrastive Features and Attention Modules

Video Salient Object Detection via Fully Convolutional Networks

Video-based Salient Object Detection Via Spatio-Temporal Difference and Coherence

End-to-End Video Saliency Detection Via a Deep Contextual Spatiotemporal Network

Spatial attention-guided deformable fusion network for salient object detection

Local Attention Sequence Model for Video Object Detection

Video Saliency Prediction Using Enhanced Spatiotemporal Alignment Network

Global-prior-guided fusion network for salient object detection

Video Saliency Prediction using Spatiotemporal Residual Attentive Networks.

Global Context Encoding For Salient Objects Detection

Guidance and Teaching Network for Video Salient Object Detection

Spatio-Temporal Self-Attention Network for Video Saliency Prediction

Motion-Aware Memory Network for Fast Video Salient Object Detection

Contrast-Oriented Deep Neural Networks for Salient Object Detection

DeepSaliency : MultiTask Deep Neural Network Model for Salient Object Detection

Learning Complementary Spatial-Temporal Transformer for Video Salient Object Detection

Global contextual guided residual attention network for salient object detection

Salient object detection by aggregating contextual information