Abstract:Joint Detection and Embedding (JDE) trackers have demonstrated excellent performance in Multi-Object Tracking (MOT) tasks by incorporating the extraction of appearance features as auxiliary tasks through embedding Re-Identification task (ReID) into the detector, achieving a balance between inference speed and tracking performance. However, solving the competition between the detector and the feature extractor has always been a challenge. Meanwhile, the issue of directly embedding the ReID task into MOT has remained unresolved. The lack of high discriminability in appearance features results in their limited utility. In this paper, a new learning approach using cross-correlation to capture temporal information of objects is proposed. The feature extraction network is no longer trained solely on appearance features from each frame but learns richer motion features by utilizing feature heatmaps from consecutive frames, which addresses the challenge of inter-class feature similarity. Furthermore, our learning approach is applied to a more lightweight feature extraction network, and treat the feature matching scores as strong cues rather than auxiliary cues, with an appropriate weight calculation to reflect the compatibility between our obtained features and the MOT task. Our tracker, named TCBTrack, achieves state-of-the-art performance on multiple public benchmarks, i.e., MOT17, MOT20, and DanceTrack datasets. Specifically, on the DanceTrack test set, we achieve 56.8 HOTA, 58.1 IDF1 and 92.5 MOTA, making it the best online tracker capable of achieving real-time performance. Comparative evaluations with other trackers prove that our tracker achieves the best balance between speed, robustness and accuracy. Code is available at <a class="link-external link-https" href="https://github.com/yfzhang1214/TCBTrack" rel="external noopener nofollow">this https URL</a>.

Real-time multi-class object detection using two-dimensional index

Multi-objects Real Time Recognition Based on Color Information

Realtime object matching with robust dominant orientation templates

Realtime and Robust Object Matching with a Large Number of Templates

Multi-view Aggregation for Real-Time Accurate Object Detection of a Moving Camera

Real-time Object Classification in Video Surveillance Based on Appearance Learning

A fast template matching algorithm based on principal orientation difference

Object-Level Pseudo-3D Lifting for Distance-Aware Tracking

Novel Framework for Multi-view Object Detection through Combining Multiple Classifiers

Real-Time Cascade Template Matching for Object Instance Detection

Real-Time Multiple Object Tracking with Discriminative Features

Real-time Multi-Object Tracking Based on Bi-directional Matching

Real-time object retrieval with dominant orientation template matching improved by pyramid scoring

Multi-strategy object tracking in complex situation for video surveillance

Occlusion-Aware Real-Time Object Tracking

Real-time Detection of the Moving Object in Video Sequences

Towards Real-Time Multi-Object Tracking

Visual tracking by dynamic matching-classification network switching

Multi-modal Queried Object Detection in the Wild

Temporal Correlation Meets Embedding: Towards a 2nd Generation of JDE-based Real-Time Multi-Object Tracking

Real-Time Online Multi-Object Tracking