MapTrack: Tracking in the Map

Fei Wang,Ruohui Zhang,Chenglin Chen,Min Yang,Yun Bai
2024-02-20
Abstract:Multi-Object Tracking (MOT) aims to maintain stable and uninterrupted trajectories for each target. Most state-of-the-art approaches first detect objects in each frame and then implement data association between new detections and existing tracks using motion models and appearance similarities. Despite achieving satisfactory results, occlusion and crowds can easily lead to missing and distorted detections, followed by missing and false associations. In this paper, we first revisit the classic tracker DeepSORT, enhancing its robustness over crowds and occlusion significantly by placing greater trust in predictions when detections are unavailable or of low quality in crowded and occluded scenes. Specifically, we propose a new framework comprising of three lightweight and plug-and-play algorithms: the probability map, the prediction map, and the covariance adaptive Kalman filter. The probability map identifies whether undetected objects have genuinely disappeared from view (e.g., out of the image or entered a building) or are only temporarily undetected due to occlusion or other reasons. Trajectories of undetected targets that are still within the probability map are extended by state estimations directly. The prediction map determines whether an object is in a crowd, and we prioritize state estimations over observations when severe deformation of observations occurs, accomplished through the covariance adaptive Kalman filter. The proposed method, named MapTrack, achieves state-of-the-art results on popular multi-object tracking benchmarks such as MOT17 and MOT20. Despite its superior performance, our method remains simple, online, and real-time. The code will be open-sourced later.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address several key issues in Multi-Object Tracking (MOT), particularly the challenges of tracking objects in crowded scenes and under occlusion. Specifically, the paper points out that current state-of-the-art methods, such as DeepSORT, have the following major issues when dealing with these complex scenarios: 1. **ID Switching Problem**: In crowded scenes, during object crossing or occlusion, the same object may be identified as different IDs in consecutive frames. This is mainly because the detector may miss targets that are partially or completely occluded, or the detected bounding boxes may be erroneous, causing the matching algorithm to associate these incorrect detections with incorrect trajectories. 2. **Local Information Dependency Problem**: Existing methods typically use information between only two frames for data association, without fully utilizing global information, which limits their ability to handle complex scenes. 3. **Appearance Similarity Confusion Problem**: In some cases, targets with similar appearances can confuse the re-identification (ReID) model, further exacerbating the ID switching problem. 4. **Unreasonable Priority in Matching Algorithm**: DeepSORT's matching cascade algorithm tends to give higher matching priority to more frequently appearing targets, which is unreasonable in many scenarios. To address these issues, the paper proposes a new framework—MapTrack, which enhances the robustness of the tracking system by introducing Probability Map, Prediction Map, and Covariance Adaptive Kalman Filter. The specific improvements include: - **Probability Map**: Used to determine whether an undetected target has truly disappeared or is just temporarily undetected. If a trajectory is within the probability map and has no matching detection result, the system will consider the target as possibly occluded or undetected by the detector and will directly use the predicted position. - **Prediction Map**: Used to determine whether a target is in a crowded area. If the detected bounding box is severely deformed, the system will rely more on state estimation rather than observation, achieved through the Covariance Adaptive Kalman Filter. - **Covariance Adaptive Kalman Filter**: Dynamically adjusts the measurement covariance based on detection quality, reducing mismatches in crowded and occluded scenes. - **Momentum Strategy**: Used to smooth velocity measurements, reducing error accumulation caused by the constant velocity assumption. With these improvements, MapTrack achieves state-of-the-art performance on several popular datasets while maintaining simplicity, online, and real-time characteristics.