Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach

Yabin Zhu,Qianwu Wang,Chenglong Li,Jin Tang,Zhixiang Huang

2024-08-02

Abstract:The complementary benefits from visible and thermal infrared data are widely utilized in various computer vision task, such as visual tracking, semantic segmentation and object detection, but rarely explored in Multiple Object Tracking (MOT). In this work, we contribute a large-scale Visible-Thermal video benchmark for MOT, called VT-MOT. VT-MOT has the following main advantages. 1) The data is large scale and high diversity. VT-MOT includes 582 video sequence pairs, 401k frame pairs from surveillance, drone, and handheld platforms. 2) The cross-modal alignment is highly accurate. We invite several professionals to perform both spatial and temporal alignment frame by frame. 3) The annotation is dense and high-quality. VT-MOT has 3.99 million annotation boxes annotated and double-checked by professionals, including heavy occlusion and object re-acquisition (object disappear and reappear) challenges. To provide a strong baseline, we design a simple yet effective tracking framework, which effectively fuses temporal information and complementary information of two modalities in a progressive manner, for robust visible-thermal MOT. A comprehensive experiment are conducted on VT-MOT and the results prove the superiority and effectiveness of the proposed method compared with state-of-the-art methods. From the evaluation results and analysis, we specify several potential future directions for visible-thermal MOT. The project is released in <a class="link-external link-https" href="https://github.com/wqw123wqw/PFTrack" rel="external noopener nofollow">this https URL</a>.

Computer Vision and Pattern Recognition

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper primarily aims to address the challenges of multi-object tracking (MOT) in complex environments, particularly under conditions such as low light and haze. Specifically: 1. **Dataset Construction**: - Constructed a large-scale visible-thermal infrared video benchmark dataset (VT-MOT) for multi-object tracking tasks. - The dataset includes 582 video sequence pairs, totaling 401k frame pairs, sourced from three platforms: drones, surveillance cameras, and handheld devices. - The dataset features high-precision temporal and spatial alignment and provides dense and high-quality annotations, including 3.99 million annotation boxes. 2. **Fusion Method Design**: - Proposed a novel progressive fusion tracking framework named PFTrack, which effectively integrates temporal information and complementary information from two modalities (visible light and thermal infrared) to enhance target feature representation. - Through a two-stage fusion module (PFM), including temporal feature fusion and multi-modal feature fusion, it fully utilizes multi-modal and temporal information to improve tracking performance. 3. **Experimental Validation**: - Conducted extensive experiments on the VT-MOT dataset, demonstrating the advantages and effectiveness of the proposed method compared to existing technologies, and pointed out future research directions. Through these efforts, the paper aims to advance the research and development of multi-object tracking under all-weather and all-time conditions.

Visible-Thermal Multiple Object Tracking: Large-scale Video Dataset and Progressive Fusion Approach

Online Multi-Object Tracking from A Bird's-Eye View by Fusion of Millimeter-Wave Radar and Vision

Enhancing Thermal MOT: A Novel Box Association Method Leveraging Thermal Identity and Motion Similarity

MAT: Motion-Aware Multi-Object Tracking

The Thermal Infrared Visual Object Tracking VOT-TIR2015 Challenge Results

The Thermal Infrared Visual Object Tracking VOT-TIR2016 Challenge Results

Visible-Thermal UAV Tracking: A Large-Scale Benchmark and New Baseline

Cross-Modal Object Tracking via Modality-Aware Fusion Network and a Large-Scale Dataset

Deep learning and multi-modal fusion for real-time multi-object tracking: Algorithms, challenges, datasets, and comparative study

MATI: Multimodal Adaptive Tracking Integrator for Robust Visual Object Tracking

Multiple Object Tracking by Trajectory Map Regression with Temporal Priors Embedding

OVTrack: Open-Vocabulary Multiple Object Tracking

Exploring fusion strategies for accurate RGBT visual object tracking

Misaligned Visible-Thermal Object Detection: A Drone-based Benchmark and Baseline

MotionTrack: Learning Robust Short-term and Long-term Motions for Multi-Object Tracking

MOTR: End-to-End Multiple-Object Tracking with Transformer

VisEvent: Reliable Object Tracking via Collaboration of Frame and Event Flows

STCMOT: Spatio-Temporal Cohesion Learning for UAV-Based Multiple Object Tracking

Multimodal Multiobject Tracking by Fusing Deep Appearance Features and Motion Information

MCTrack: A Unified 3D Multi-Object Tracking Framework for Autonomous Driving