Abstract:Memory-based networks have achieved tremendous success in video object segmentation. However, these methods still suffer from unfaithful segmentation and inferior efficiency under complicated video scenarios. The reasons are mainly threefold: 1) Weak perception of fast-moving targets due to individual frame memory patterns without capturing inter-frame motion; 2) Lack of discrimination to visually similar appearances due to the limited receptive field; 3) Redundant computation caused by matching with all memorized frames. To address these issues, we propose a Temporo-Spatial Parallel Sparse Memory network (TSPSM) for efficient video object segmentation. Our TSPSM constructs a temporal memory bank and a spatial memory bank in parallel to memorize complementary discriminative object cues. The temporal bank exploits discriminative temporal motion cues, while the spatial bank mines spatial context cues between adjacent frames with large receptive fields, thereby alleviating the ambiguity caused by similar instances and fast movements. To reduce redundant computation without sacrificing performance during the matching step, we further design a parallel sparse memory reader based on the constructed informative memory banks, which efficiently retrieves relevant temporal and spatial information in a parallel way. Experiments demonstrate that our TSPSM achieves state-of-the-art performance with real-time speed on DAVIS, and YouTube-VOS benchmarks. Furthermore, extensive experiments show that the proposed TSPMC module can be applied to existing methods as a generic plugin to significantly improve performance.

Looking Fast and Slow: Memory-Guided Mobile Video Object Detection

Fast Real-Time Video Object Segmentation with a Tangled Memory Network

VMM: Viewpoint-based Memory Mechanism for Object Detection of Moving Sensors

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Object Guided External Memory Network for Video Object Detection

Multi-view Aggregation for Real-Time Accurate Object Detection of a Moving Camera

Learning Video Object Segmentation with Visual Memory

Memory-based Object Detection in Surveillance Scenes

Global Memory and Local Continuity for Video Object Detection

Motion-Aware Memory Network for Fast Video Salient Object Detection

Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation

Robust and Efficient Memory Network for Video Object Segmentation

Memory Enhanced Global-Local Aggregation for Video Object Detection.

Memory-based Cognitive Modeling for Robust Object Extraction and Tracking

Temporo-Spatial Parallel Sparse Memory Networks for Efficient Video Object Segmentation

Retinomorphic Object Detection in Asynchronous Visual Streams.

Adaptive Focus for Efficient Video Recognition

You Don't Only Look Once - Constructing Spatial-Temporal Memory for Integrated 3D Object Detection and Tracking.

Adaptive Memory Management for Video Object Segmentation