Abstract:Recently, memory-based networks have achieved promising performance for video object segmentation (VOS). However, existing methods still suffer from unsatisfactory segmentation accuracy and inferior efficiency. The reasons are mainly twofold: 1) during memory construction, the inflexible memory storage mechanism results in a weak discriminative ability for similar appearances in complex scenarios, leading to video-level temporal redundancy, and 2) during memory reading, matching robustness and memory retrieval accuracy decrease as the number of video frames increases. To address these challenges, we propose an adaptive sparse memory network (ASM) that efficiently and effectively performs VOS by sparsely leveraging previous guidance while attending to key information. Specifically, we design an adaptive sparse memory constructor (ASMC) to adaptively memorize informative past frames according to dynamic temporal changes in video frames. Furthermore, we introduce an attentive local memory reader (ALMR) to quickly retrieve relevant information using a subset of memory, thereby reducing frame-level redundant computation and noise in a simpler and more convenient manner. To prevent key features from being discarded by the subset of memory, we further propose a novel attentive local feature aggregation (ALFA) module, which preserves useful cues by selectively aggregating discriminative spatial dependence from adjacent frames, thereby effectively increasing the receptive field of each memory frame. Extensive experiments demonstrate that our model achieves state-of-the-art performance with real-time speed on six popular VOS benchmarks. Furthermore, our ASM can be applied to existing memory-based methods as generic plugins to achieve significant performance improvements. More importantly, our method exhibits robustness in handling sparse videos with low frame rates.

Lightweight Video Object Segmentation Based on ConvGRU.

Learning Spatiotemporal Relationships with a Unified Framework for Video Object Segmentation

Fast Real-Time Video Object Segmentation with a Tangled Memory Network

Learning Quality-aware Dynamic Memory for Video Object Segmentation

Efficient video object segmentation based on Gaussian mixture model and Markov random field

Video object segmentation via couple streams and feature memory

SCREENING AND CHARACTERIZATION OF KERATINASE FROM Bacillus licheniformis ISOLATED FROM NAMAKKAL POULTRY FARM

Learning Spatial-Semantic Features for Robust Video Object Segmentation

Video object segmentation by Multi-Scale Pyramidal Multi-Dimensional LSTM with generated depth context

Dual Temporal Memory Network for Efficient Video Object Segmentation

Beyond Appearance: Multi-Frame Spatio-Temporal Context Memory Networks for Efficient and Robust Video Object Segmentation

Design Light-weight 3D Convolutional Networks for Video Recognition Temporal Residual, Fully Separable Block, and Fast Algorithm

Learning Video Object Segmentation with Visual Memory

Spatial-information Guided Adaptive Context-aware Network for Efficient RGB-D Semantic Segmentation

Video Object Segmentation via Structural Feature Reconfiguration

Video Object Segmentation by Learning Location-Sensitive Embeddings

A Lightweight YOLOv5-Based Model with Feature Fusion and Dilation Convolution for Image Segmentation

Video Object Segmentation with 3D Convolution Network

Video segmentation with L0 gradient minimization

LiVOS: Light Video Object Segmentation with Gated Linear Matching

Adaptive Sparse Memory Networks for Efficient and Robust Video Object Segmentation