Motion Cues Guided Feature Aggregation and Enhancement for Video Object Segmentation

Xuejun Li,Wenming Zheng,Yuan Zong
DOI: https://doi.org/10.1016/j.neucom.2022.03.064
IF: 6
2022-01-01
Neurocomputing
Abstract:Video object segmentation (VOS) aims to separate unknown target objects from various given video sequences. Although many recent successful methods boosted the performance of VOS, especially those using deep convolution neural networks (CNNs), it is still difficult to aggregate deep features as well as motion cues effectively, which can be important to associate valid information of adjacent frames in video sequences. To tackle this problem, we propose a simple yet effective feature optimization method for VOS based on motion information. To achieve this, we construct a two-branch deep network and use computed motion cues (i.e., optical flow) to jointly optimize global and local interframe correlation information. Additionally, a clustering-based feature enhancement module is proposed to further fuse motion information and enhance the feature saliency of the target area. Optimized feature maps show a significant performance improvement in the final VOS tasks, especially those with rapid target movement. Experiments on the DAVIS16, DAVIS17, YouTube-Objects and YouTube-VOS datasets demonstrate that our simple feature aggregation and enhancement method for VOS improves segmentation accuracy effectively and gains an impressive result compared to many state-of-the-art methods.
What problem does this paper attempt to address?