Toward High Quality Multi-Object Tracking and Segmentation Without Mask Supervision

Wensheng Cheng,Yi Wu,Zhenyu Wu,Haibin Ling,Gang Hua
DOI: https://doi.org/10.1109/tip.2024.3403497
IF: 10.6
2024-06-07
IEEE Transactions on Image Processing
Abstract:Recently studies have shown the potential of weakly supervised multi-object tracking and segmentation, but the drawbacks of coarse pseudo mask label and limited utilization of temporal information remain to be unresolved. To address these issues, we present a framework that directly uses box label to supervise the segmentation network without resorting to pseudo mask label. In addition, we propose to fully exploit the temporal information from two perspectives. Firstly, we integrate optical flow-based pairwise consistency to ensure mask consistency across frames, thereby improving mask quality for segmentation. Secondly, we propose a temporally adjacent pair-based sampling strategy to adapt instance embedding learning for data association in tracking. We combine these techniques into an end-to-end deep model, named BoxMOTS, which requires only box annotation without mask supervision. Extensive experiments demonstrate that our model surpasses current state-of-the-art by a large margin, and produces promising results on KITTI MOTS and BDD100K MOTS. The source code is available at https://github.com/Spritea/BoxMOTS.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?