Abstract:Recent advances in deep convolution neural networks (CNNs) boost the development of video salient object detection (SOD), and many remarkable deep-CNNs video SOD models have been proposed. However, many existing deep-CNNs video SOD models still suffer from coarse boundaries of the salient object, which may be attributed to the loss of high-frequency information. The traditional graph-based video SOD models can preserve object boundaries well by conducting superpixels/supervoxels segmentation in advance, but they perform weaker in highlighting the whole object than the latest deep-CNNs models, limited by heuristic graph clustering algorithms. To tackle this problem, we find a new way to address this issue under the framework of graph convolution networks (GCNs), taking advantage of graph model and deep neural network. Specifically, a superpixel-level spatiotemporal graph is first constructed among multiple frame-pairs by exploiting the motion cues implied in the frame-pairs. Then the graph data is imported into the devised multi-stream attention-aware GCN, where a novel Edge-Gated graph convolution (GC) operation is proposed to boost the saliency information aggregation on the graph data. A novel attention module is designed to encode the spatiotemporal sematic information via adaptive selection of graph nodes and fusion of the static-specific and the motion-specific graph embedding. Finally, a smoothness-aware regularization term is proposed to enhance the uniformity of salient object. Graph nodes (superpixels) inherently belonging to the same class will be ideally clustered together in the learned embedding space. Extensive experiments have been conducted on three widely used datasets. Compared with fourteen state-of-the-art video SOD models, our proposed method can well retain the salient object boundaries and possess a strong learning ability, which shows that this work is a good practice for designing GCNs for video SOD.

Motion Context Guided Edge-preserving Network for Video Salient Object Detection

Bidirectional Cross-Selective Attention Network for Video Salient Object Detection

Edge Preserving and Multi-Scale Contextual Neural Network for Salient Object Detection.

Motion Guided Attention for Video Salient Object Detection.

BCNet: Bidirectional Collaboration Network for Edge-Guided Salient Object Detection

Depth Cue Enhancement and Guidance Network for RGB-D Salient Object Detection

EGNet:Edge Guidance Network for Salient Object Detection

A Unified Multiple Inducible Co-attentions and Edge Guidance Network for Co-saliency Detection

A Video Salient Object Detection Model Guided By Spatio-Temporal Prior

Salient Object Detection in Video Based on Dynamic Attention Center.

Video Salient Object Detection via Robust Seeds Extraction and Multi-Graphs Manifold Propagation

CMGNet: Context-aware Middle-Layer Guidance Network for Salient Object Detection

Attention Embedded Spatio-Temporal Network for Video Salient Object Detection

Progressively Real-Time Video Salient Object Detection Via Cascaded Fully Convolutional Networks with Motion Attention.

Video Salient Object Detection Network with Bidirectional Memory and Spatiotemporal Constraints.

Multi-Stream Attention-Aware Graph Convolution Network for Video Salient Object Detection

Video Salient Object Detection Via Spatiotemporal Co-Attention and Global Structural Dependence

Real-time salient object detection with boundary information guidance

STEG-Net: Spatiotemporal Edge Guidance Network for Video Salient Object Detection

EGA-Net: Edge feature enhancement and global information attention network for RGB-D salient object detection

A Cross-Modal Edge-Guided Salient Object Detection for RGB-D Image.