Abstract:Multi-object tracking (MOT) emerges as a pivotal and highly promising branch in the field of computer vision. Classical closed-vocabulary MOT (CV-MOT) methods aim to track objects of predefined categories. Recently, some open-vocabulary MOT (OV-MOT) methods have successfully addressed the problem of tracking unknown categories. However, we found that the CV-MOT and OV-MOT methods each struggle to excel in the tasks of the other. In this paper, we present a unified framework, Associate Everything Detected (AED), that simultaneously tackles CV-MOT and OV-MOT by integrating with any off-the-shelf detector and supports unknown categories. Different from existing tracking-by-detection MOT methods, AED gets rid of prior knowledge (e.g. motion cues) and relies solely on highly robust feature learning to handle complex trajectories in OV-MOT tasks while keeping excellent performance in CV-MOT tasks. Specifically, we model the association task as a similarity decoding problem and propose a sim-decoder with an association-centric learning mechanism. The sim-decoder calculates similarities in three aspects: spatial, temporal, and cross-clip. Subsequently, association-centric learning leverages these threefold similarities to ensure that the extracted features are appropriate for continuous tracking and robust enough to generalize to unknown categories. Compared with existing powerful OV-MOT and CV-MOT methods, AED achieves superior performance on TAO, SportsMOT, and DanceTrack without any prior knowledge. Our code is available at <a class="link-external link-https" href="https://github.com/balabooooo/AED" rel="external noopener nofollow">this https URL</a>.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve two key problems in the field of multi - object tracking (MOT): 1. **The gap between closed - vocabulary MOT (CV - MOT) and open - vocabulary MOT (OV - MOT)**: - Closed - vocabulary MOT methods can only track objects of predefined categories, such as people, cars, etc. These methods perform well when dealing with known categories but perform poorly when encountering unknown categories. - Open - vocabulary MOT methods can adapt to a wider range of categories, including those not seen during training, but the tracking effect on certain specific categories is not as good as that of fine - tuned closed - vocabulary MOT methods. 2. **The dependence of existing MOT methods on prior knowledge**: - Existing MOT methods usually rely on motion cues or other prior knowledge to achieve object association, which may lead to performance degradation when dealing with complex motion patterns or unknown categories. To solve these problems, the author proposes a unified framework - **Associate Everything Detected (AED)**. AED solves the above problems in the following ways: - **Unifying CV - MOT and OV - MOT tasks**: AED can handle both closed - vocabulary and open - vocabulary MOT tasks within the same framework and support the tracking of unknown categories. - **Reducing the dependence on prior knowledge**: AED only relies on powerful feature learning to handle complex trajectories without relying on prior knowledge such as motion cues. - **Introducing an association - center - learning mechanism**: To ensure that the extracted features are suitable for continuous tracking and can be generalized to unknown categories, AED designs a sim - decoder and combines an association - center - learning mechanism to calculate similarities from three aspects: space, time, and across segments. Specifically, AED models the association task as a similarity - decoding problem and calculates the similarity between object queries and trajectory queries through the sim - decoder. In addition, AED uses the contrast - learning method to enhance the spatio - temporal consistency and long - term ID consistency of the model during training, thereby improving the tracking performance. In summary, the goal of AED is to provide a general and robust multi - object - tracking solution that can handle both known and unknown - category targets simultaneously and reduce the dependence on prior knowledge.

Associate Everything Detected: Facilitating Tracking-by-Detection to the Unknown

Online Multi-Object Tracking from A Bird's-Eye View by Fusion of Millimeter-Wave Radar and Vision

APPTracker Plus : Displacement Uncertainty for Occlusion Handling in Low-Frame-Rate Multiple Object Tracking

APPTracker: Improving Tracking Multiple Objects in Low-Frame-Rate Videos

Object-Level Pseudo-3D Lifting for Distance-Aware Tracking

MAT: Motion-Aware Multi-Object Tracking

Tracking Every Thing in the Wild.

Chained-Tracker: Chaining Paired Attentive Regression Results for End-to-End Joint Multiple-Object Detection and Tracking

TOPIC: A Parallel Association Paradigm for Multi-Object Tracking under Complex Motions and Diverse Scenes

Deep Efficient Data Association for Multi-Object Tracking: Augmented with SSIM-Based Ambiguity Elimination

Online Multi-Object Tracking With Visual and Radar Features

Multi-object tracking via deep feature fusion and association analysis

ADA-Track: End-to-End Multi-Camera 3D Multi-Object Tracking with Alternating Detection and Association

DeconfuseTrack:Dealing with Confusion for Multi-Object Tracking

Deep Affinity Network for Multiple Object Tracking

Online Multi-Object Tracking Based on Feature Representation and Bayesian Filtering Within a Deep Learning Architecture

MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

A Two-Stage Data Association Approach for 3D Multi-Object Tracking

Online Multiple Object Tracking with Cross-Task Synergy

Towards Real-Time Multi-Object Tracking

FACT: Feature Adaptive Continual-learning Tracker for Multiple Object Tracking