ClickTrack: Towards Real-time Interactive Single Object Tracking

Kuiran Wang,Xuehui Yu,Wenwen Yu,Guorong Li,Xiangyuan Lan,Qixiang Ye,Jianbin Jiao,Zhenjun Han
2024-11-24
Abstract:Single object tracking(SOT) relies on precise object bounding box initialization. In this paper, we reconsidered the deficiencies in the current approaches to initializing single object trackers and propose a new paradigm for single object tracking algorithms, ClickTrack, a new paradigm using clicking interaction for real-time scenarios. Moreover, click as an input type inherently lack hierarchical information. To address ambiguity in certain special scenarios, we designed the Guided Click Refiner(GCR), which accepts point and optional textual information as inputs, transforming the point into the bounding box expected by the operator. The bounding box will be used as input of single object trackers. Experiments on LaSOT and GOT-10k benchmarks show that tracker combined with GCR achieves stable performance in real-time interactive scenarios. Furthermore, we explored the integration of GCR into the Segment Anything model(SAM), significantly reducing ambiguity issues when SAM receives point inputs.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problems existing in the initialization methods of single - object tracking (SOT) in real - time interactive scenarios. Specifically, the existing SOT algorithms highly depend on the accurate target bounding box annotations in the first frame, and such annotations are difficult to obtain accurately in real - time interactive scenarios. Therefore, the paper proposes a new paradigm - ClickTrack, aiming to initialize the single - object tracker through click - based interaction to meet the requirements of real - time interactive scenarios. ### Main Problems and Solutions 1. **Limitations of Initialization Methods**: - **Detector - based Initialization**: The detector may output multiple bounding boxes and cannot specify the target when the detection fails. - **Natural - Language - Description - based Initialization**: It requires accurate language descriptions, which increases the interaction time and instability, and it is difficult to accurately locate in multi - object scenarios. - **Limitations of Click - based Interaction**: Although the click operation is fast, simple, and precise, it lacks hierarchical information and may cause the model to be unable to accurately determine the target in some scenarios. 2. **Proposed Solutions**: - **Guided Click Refiner (GCR)**: To solve the problem that click - based input lacks hierarchical information, the paper designs the GCR module. The GCR takes the click point and optional text information as inputs, converts the click point into the desired bounding box, thereby providing initialization for the single - object tracker. - **Guided Convolution (GC)**: The core structure, which uses the guiding information to guide the regression process to ensure that the generated bounding box meets the operator's expectations. - **Prototype Selection (PS)**: Select the most appropriate initial regression area and exclude the influence of the background or interfering objects. - **Iterative Refinement (IR)**: Optimize the regression results through multiple iterations to improve the prediction accuracy. ### Experimental Verification The paper conducted a large number of experiments on the LaSOT and GOT - 10k benchmark datasets to verify the effectiveness of GCR. The experimental results show that, compared with traditional initialization methods, GCR provides more stable and accurate tracking performance in real - time interactive scenarios. ### Key Contributions - Propose a new SOT paradigm, ClickTrack, which initializes the single - object tracker through click - based interaction. - Design the GCR model, which solves the problem that click - based input lacks hierarchical information and improves the accuracy of initialization. - The experimental results prove the advantages of GCR in real - time interactive scenarios, especially when combined with text information, significantly reducing the ambiguity problem. Through these improvements, ClickTrack provides new ideas and technical support for the application of single - object tracking in real - time interactive scenarios.