IDPro: Flexible Interactive Video Object Segmentation by ID-queried Concurrent Propagation

Kexin Li,Tao Jiang,Zongxin Yang,Yi Yang,Yueting Zhuang,Jun Xiao
DOI: https://doi.org/10.1109/tcsvt.2024.3431714
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Interactive Video Object Segmentation (iVOS) is inherently demanding, requiring real-time interaction between humans and computers. Enhancing user experience involves considerations such as user input habits, segmentation quality, running time, and memory consumption. However, existing methods compromise user experience by employing a single input mode and exhibiting slow running speeds. Specifically, these approaches restrict user interaction to a single frame, limiting the expression of user intent. To overcome these limitations and better align with user habits, we introduce a framework that facilitates flexible input modes by ID-queried concurrent propagation (IDPro). In particular, we have devised the Across-Frame Interaction Module (AFI), allowing users to freely annotate various objects across multiple frames. The AFI module transfers scribble information across interactive frames, generating multi-frame masks. Additionally, we leverage an id-queried mechanism to process multiple objects. To achieve more efficient propagation and a lightweight model, we propose a truncated re-propagation strategy, replacing the previous multi-round fusion module, which employs an across-round memory that stores crucial interaction information. Our SwinB-IDPro attains a new state-of-the-art performance on DAVIS 2017 (89.6%, J & F @60). Furthermore, our R50-IDPro exhibits over 3× faster performance than the leading competitor in challenging multi-object scenarios.
What problem does this paper attempt to address?