Mmmcl3dmot: Multi-Modal Momentum Contrastive Learning for 3D Multi-Object Tracking

Ru Hong,Jiming Yang,Weidian Zhou,Feipeng Da
DOI: https://doi.org/10.1109/lsp.2024.3431435
2024-01-01
IEEE Signal Processing Letters
Abstract:3D multi-object tracking methods utilize object motion, image, and point cloud information to compute similarities between different objects, facilitating cross-frame data association. In this letter, we propose a novel approach called mmMCL3DMOT to calculate object appearance similarity by employing multi-modal momentum contrastive self-supervised learning. We introduce three key techniques. First, a self-supervised training paradigm is adopted, incorporating image, point cloud, and existing 3D detection inputs to enable multi-modal feature extraction without manual annotation. Second, our feature learning approach combines intra-modal and cross-modal feature correspondences within image and point cloud modalities, resulting in more discriminative feature extraction with momentum contrast. Finally, by computing similarity using the multi-modal features and incorporating a robust motion metric, we enable joint cascade reasoning for object association, leading to high-performance 3D MOT. Extensive experiments have demonstrated the significant impact of our method. Moreover, our tracker achieves state-of-the-art (SOTA) performance on both the KITTI and nuScenes datasets.
What problem does this paper attempt to address?