Kernel based local matching network for video object segmentation

Guoqiang Wang,Lan Li,Min Zhu,Rui Zhao,Xiang Zhang
DOI: https://doi.org/10.1007/s00138-024-01524-4
IF: 2.983
2024-03-26
Machine Vision and Applications
Abstract:Recently, the methods based on space-time memory network have achieved advanced performance in semi-supervised video object segmentation, which has attracted wide attention. However, this kind of methods still have a fatal limitation. It has the interference problem of similar objects caused by the way of non-local matching, which seriously limits the performance of video object segmentation. To solve this problem, we propose a Kernel-guided Attention Matching Network (KAMNet) by the use of local matching instead of non-local matching. At first, KAMNet uses spatio-temporal attention mechanism to enhance the model's discrimination between foreground objects and background areas. Then KAMNet utilizes gaussian kernel to guide the matching between the current frame and the reference set. Because the gaussian kernel decays away from the center, it can limit the matching to the central region, thus achieving local matching. Our KAMNet gets speed-accuracy trade-off on benchmark datasets DAVIS 2016 ( of 87.6%) and DAVIS 2017 ( of 76.0%) with 0.12 second per frame.
computer science, cybernetics, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?