HFVOS: History-Future Integrated Dynamic Memory for Video Object Segmentation

Wanyun Li,Jack Fan,Pinxue Guo,Lingyi Hong,Wei Zhang
DOI: https://doi.org/10.1109/tcsvt.2024.3404469
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Memory-based methods have substantially enhanced the precision of video object segmentation (VOS) by storing features in an expanding memory bank. However, this comes at the cost of increased computational demands and storage overhead. While recent methods have sought to alleviate this issue via compression or selection strategies, their reliance solely on history cues and simple memory structures result in precision degradation and intrinsic limitations, such as error accumulation and poor robustness. In this paper, we introduce HFVOS, an efficient yet effective framework to bolster VOS performance in both speed and precision by meticulously considering the memory design with low redundancy, high accuracy, and adaptability. First, we construct a novel hierarchical memory update pipeline with the proposed Buffered Memory Mechanism, which incorporates both future and history cues to reduce redundancy and improve the utility of memory. Second, we propose an Adaptive Dual-stream Selection Network (ADSN) to carry out the adaptive selection and drop operations of the memory update, and integrate an ADSN based long-term memory to enhance the robustness, especially for long videos. Furthermore, to further boost HFVOS, a progressive selection loss is designed to facilitate ADSN gradually adapt to fewer features while preserving high precision. Experiments show that HFVOS achieves the state-of-the-art segmentation precision and speed on both short-term datasets (DAVIS-17 val: 86.8% J & F and 33.0 FPS, DAVIS-16 val: 92.0% J & F and 42.0 FPS) and long-term datasets (LVOS val: 58.0% J & F and 37.4 FPS). Code will be available at https://github.com/L599wy/HFVOS.
What problem does this paper attempt to address?