Abstract:Supervised trackers trained on labeled data dominate the single object tracking field for superior tracking accuracy. The labeling cost and the huge computational complexity hinder their applications on edge devices. Unsupervised learning methods have also been investigated to reduce the labeling cost but their complexity remains high. Aiming at lightweight high-performance tracking, feasibility without offline pre-training, and algorithmic transparency, we propose a new single object tracking method, called the green object tracker (GOT), in this work. GOT conducts an ensemble of three prediction branches for robust box tracking: 1) a global object-based correlator to predict the object location roughly, 2) a local patch-based correlator to build temporal correlations of small spatial units, and 3) a superpixel-based segmentator to exploit the spatial information of the target frame. GOT offers competitive tracking accuracy with state-of-the-art unsupervised trackers, which demand heavy offline pre-training, at a lower computation cost. GOT has a tiny model size (<3k parameters) and low inference complexity (around 58M FLOPs per frame). Since its inference complexity is between 0.1%-10% of DL trackers, it can be easily deployed on mobile and edge devices.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to develop an unsupervised, high - performance and lightweight object tracker, especially for single - object tracking tasks. Specifically, the paper proposes a new single - object tracking method, called Green Object Tracker (GOT), aiming to achieve the following goals: 1. **No offline pre - training**: It avoids the need for a large amount of labeled data in supervised learning and also reduces the computational complexity, enabling the model to be easily deployed on edge devices. 2. **Algorithm transparency**: Improve the transparency of the algorithm through modular design, which is convenient for understanding and optimization. 3. **Low computational cost**: GOT has a very small number of model parameters (<3k parameters) and low inference complexity (about 58M FLOPs per frame), and its inference complexity is only 0.1% - 10% of that of deep - learning trackers, which is suitable for mobile and edge devices. To achieve these goals, GOT adopts an integrated method of three prediction branches to improve the robustness and accuracy of tracking: 1. **Global object correlator**: Used for roughly predicting the object position. 2. **Local patch correlator**: Used to establish the temporal correlation of small spatial units, providing more flexible shape estimation and object re - identification. 3. **Super - pixel segmentor**: Utilize the spatial information of the target frame, such as color similarity and geometric constraints, to generate multiple candidate boxes. GOT generates the final tracking result by fusing the outputs of these three branches. In addition, GOT also introduces some innovative strategies, such as the two - stage training strategy of the local patch classifier and the heat - map - based noise suppression method, to further improve the tracking performance. In general, the main contribution of this paper lies in proposing a single - object tracking method that can achieve high performance and low computational cost in the case of being unsupervised and without offline pre - training.

Unsupervised Green Object Tracker (GOT) without Offline Pre-training

GUSOT: Green and Unsupervised Single Object Tracking for Long Video Sequences

APPTracker: Improving Tracking Multiple Objects in Low-Frame-Rate Videos

Tracking Randomly Moving Objects on Edge Box Proposals

UHP-SOT: An Unsupervised High-Performance Single Object Tracker

Beyond Local Search: Tracking Objects Everywhere with Instance-Specific Proposals

Learning to Track Objects from Unlabeled Videos.

Beyond SOT: Tracking Multiple Generic Objects at Once

Unsupervised Lightweight Single Object Tracking with UHP-SOT++

RTrack: Accelerating Convergence for Visual Object Tracking via Pseudo-Boxes Exploration

GSOT3D: Towards Generic 3D Single Object Tracking in the Wild

Efficient Object Tracking on Edge Devices with MobileTrack

Tracking Objects as Points

Sparsely-Supervised Object Tracking

Object tracking based on supervised saliency detection

Toward Accurate Pixelwise Object Tracking via Attention Retrieval

BOTT: Box Only Transformer Tracker for 3D Object Tracking

Distractor-Aware Fast Tracking Via Dynamic Convolutions and MOT Philosophy

EasyTrack: Efficient and Compact One-stream 3D Point Clouds Tracker

OST: Efficient One-stream Network for 3D Single Object Tracking in Point Clouds

Efficient object tracking algorithm based on lightweight Siamese networks