Unsupervised Green Object Tracker (GOT) without Offline Pre-training

Zhiruo Zhou,Suya You,C.-C. Jay Kuo
DOI: https://doi.org/10.48550/arXiv.2309.09078
2023-09-17
Abstract:Supervised trackers trained on labeled data dominate the single object tracking field for superior tracking accuracy. The labeling cost and the huge computational complexity hinder their applications on edge devices. Unsupervised learning methods have also been investigated to reduce the labeling cost but their complexity remains high. Aiming at lightweight high-performance tracking, feasibility without offline pre-training, and algorithmic transparency, we propose a new single object tracking method, called the green object tracker (GOT), in this work. GOT conducts an ensemble of three prediction branches for robust box tracking: 1) a global object-based correlator to predict the object location roughly, 2) a local patch-based correlator to build temporal correlations of small spatial units, and 3) a superpixel-based segmentator to exploit the spatial information of the target frame. GOT offers competitive tracking accuracy with state-of-the-art unsupervised trackers, which demand heavy offline pre-training, at a lower computation cost. GOT has a tiny model size (<3k parameters) and low inference complexity (around 58M FLOPs per frame). Since its inference complexity is between 0.1%-10% of DL trackers, it can be easily deployed on mobile and edge devices.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to develop an unsupervised, high - performance and lightweight object tracker, especially for single - object tracking tasks. Specifically, the paper proposes a new single - object tracking method, called Green Object Tracker (GOT), aiming to achieve the following goals: 1. **No offline pre - training**: It avoids the need for a large amount of labeled data in supervised learning and also reduces the computational complexity, enabling the model to be easily deployed on edge devices. 2. **Algorithm transparency**: Improve the transparency of the algorithm through modular design, which is convenient for understanding and optimization. 3. **Low computational cost**: GOT has a very small number of model parameters (<3k parameters) and low inference complexity (about 58M FLOPs per frame), and its inference complexity is only 0.1% - 10% of that of deep - learning trackers, which is suitable for mobile and edge devices. To achieve these goals, GOT adopts an integrated method of three prediction branches to improve the robustness and accuracy of tracking: 1. **Global object correlator**: Used for roughly predicting the object position. 2. **Local patch correlator**: Used to establish the temporal correlation of small spatial units, providing more flexible shape estimation and object re - identification. 3. **Super - pixel segmentor**: Utilize the spatial information of the target frame, such as color similarity and geometric constraints, to generate multiple candidate boxes. GOT generates the final tracking result by fusing the outputs of these three branches. In addition, GOT also introduces some innovative strategies, such as the two - stage training strategy of the local patch classifier and the heat - map - based noise suppression method, to further improve the tracking performance. In general, the main contribution of this paper lies in proposing a single - object tracking method that can achieve high performance and low computational cost in the case of being unsupervised and without offline pre - training.