Multi-object tracking with adaptive measurement noise and information fusion
Xi Huang,Yinwei Zhan
DOI: https://doi.org/10.1016/j.imavis.2024.104964
IF: 3.86
2024-02-28
Image and Vision Computing
Abstract:Multi-object tracking (MOT) is a challenging task in computer vision that aims to estimate the trajectories of multiple objects in a video sequence. Observation-Centric SORT (OCSORT) is a pure motion-based MOT algorithm that uses the Kalman filter as the motion model and three observation-centric techniques: Re-Update, Momentum and Recovery, to enhance the data association. However, OCSORT is limited by camera motion error, constant measurement noise and lack of appearance information. In this paper, we propose three methods to address these limitations and improve the performance of OCSORT. First, we use Enhanced Correlation Coefficient Maximization (ECC) to compensate for the camera motion between adjacent frames. Second, we adjust the measurement noise scale for the Kalman filter according to the detection confidence. Third, we introduce a deep visual feature model to extract appearance information and propose a method to effectively use both motion and appearance information. The proposed method first filters out the inappropriate appearance information based on motion information and then combines the filtered appearance information with the motion information by minimization. We evaluate our algorithm on three MOT benchmarks: MOT17, MOT20 and DanceTrack. The results show that our algorithm achieves state-of-the-art performance on all datasets, especially on DanceTrack, where the objects have highly nonlinear motion and frequent occlusion. Compared to OCSORT, our algorithm improves Higher Order Tracking Accuracy (HOTA) by 1.1%, 0.8%, and 3.5%, and ID F1 Score (IDF1) by 1.7%, 1.9%, and 4.3% on MOT17, MOT20 and DanceTrack, respectively.
computer science, artificial intelligence, theory & methods,engineering, electrical & electronic, software engineering,optics