Abstract:Transformers have proven superior performance for a wide variety of tasks since they were introduced. In recent years, they have drawn attention from the vision community in tasks such as image classification and object detection. Despite this wave, an accurate and efficient multiple-object tracking (MOT) method based on transformers is yet to be designed. We argue that the direct application of a transformer architecture with quadratic complexity and insufficient noise-initialized sparse queries – is not optimal for MOT. We propose TransCenter, a transformer-based MOT architecture with dense representations for accurately tracking all the objects while keeping a reasonable runtime. Methodologically, we propose the use of image-related dense detection queries and efficient sparse tracking queries produced by our carefully designed query learning networks (QLN). On one hand, the dense image-related detection queries allow us to infer targets' locations globally and robustly through dense heatmap outputs. On the other hand, the set of sparse tracking queries efficiently interacts with image features in our TransCenter Decoder to associate object positions through time. As a result, TransCenterexhibits remarkable performance improvements and outperforms by a large margin the current state-of-the-art methods in two standard MOT benchmarks with two tracking settings (public/private). TransCenter is also proven efficient and accurate by an extensive ablation study and, comparisons to more naive alternatives and concurrent works. The code is made publicly available at https://github.com/yihongxu/transcenter.

MOT-DETR: 3D Single Shot Detection and Tracking with Transformers to build 3D representations for Agro-Food Robots

Beyond Traditional Driving Scenes: A Robotic-Centric Paradigm for 2D+3D Human Tracking Using Siamese Transformer Network

Multi-modal 3D Human Tracking for Robots in Complex Environment with Siamese Point-Video Transformer

InterTrack: Interaction Transformer for 3D Multi-Object Tracking

Object-Level Pseudo-3D Lifting for Distance-Aware Tracking

MotionTrack: End-to-End Transformer-based Multi-Object Tracing with LiDAR-Camera Fusion

Relation3DMOT: Exploiting Deep Affinity for 3D Multi-Object Tracking from View Aggregation

MSF3DDETR: Multi-Sensor Fusion 3D Detection Transformer for Autonomous Driving

EagerMOT: 3D Multi-Object Tracking via Sensor Fusion

M3DeTR: Multi-representation, Multi-scale, Mutual-relation 3D Object Detection with Transformers

MOTR: End-to-End Multiple-Object Tracking with Transformer

Strong-TransCenter: Improved Multi-Object Tracking based on Transformers with Dense Representations

TransCenter: Transformers With Dense Representations for Multiple-Object Tracking

MCTR: Multi Camera Tracking Transformer

PuTR: A Pure Transformer for Decoupled and Online Multi-Object Tracking

FastTrackTr:Towards Fast Multi-Object Tracking with Transformers

TrackFormer: Multi-Object Tracking with Transformers

A comparison between single-stage and two-stage 3D tracking algorithms for greenhouse robotics

A Tracking-By-Detection Based 3D Multiple Object Tracking for Autonomous Driving

MeMOTR: Long-Term Memory-Augmented Transformer for Multi-Object Tracking