CAMOT: Camera Angle-aware Multi-Object Tracking

Felix Limanta,Kuniaki Uto,Koichi Shinoda
DOI: https://doi.org/10.1109/WACV57701.2024.00635
2024-09-26
Abstract:This paper proposes CAMOT, a simple camera angle estimator for multi-object tracking to tackle two problems: 1) occlusion and 2) inaccurate distance estimation in the depth direction. Under the assumption that multiple objects are located on a flat plane in each video frame, CAMOT estimates the camera angle using object detection. In addition, it gives the depth of each object, enabling pseudo-3D MOT. We evaluated its performance by adding it to various 2D MOT methods on the MOT17 and MOT20 datasets and confirmed its effectiveness. Applying CAMOT to ByteTrack, we obtained 63.8% HOTA, 80.6% MOTA, and 78.5% IDF1 in MOT17, which are state-of-the-art results. Its computational cost is significantly lower than the existing deep-learning-based depth estimators for tracking.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper The paper aims to address two main issues in Multi-Object Tracking (MOT): 1. **Occlusion Problem**: In real-world scenarios, target objects are often occluded by other objects, leading to detection failures. 2. **Inaccurate Distance Estimation in Depth Direction**: When multiple objects are aligned in the depth direction, it is difficult to accurately estimate the distance between them, which may result in incorrect object association between different frames. To solve these problems, the paper proposes a method called CAMOT (Camera Angle-aware Multi-Object Tracking). CAMOT provides depth information for each object by estimating the camera angle, thereby achieving pseudo-3D multi-object tracking. Specifically, CAMOT assumes that multiple objects are located on the plane of each video frame and uses object detection to estimate the camera angle. This not only solves the occlusion problem but also more accurately measures the distance in the depth direction, improving the accuracy of object association between different frames. ### Main Contributions 1. **Lightweight Camera Angle Estimator**: Uses object detection positions to estimate the camera angle. 2. **Frame-to-Frame Object Association Using Camera Angle and Object Depth**: Combines camera angle and object depth information in 2D MOT to improve association accuracy. 3. **Evaluation on Various 2D MOT Methods**: Adds CAMOT to various 2D MOT methods for evaluation, verifying its effectiveness. ### Experimental Results - On the MOT17 dataset, ByteTrack with CAMOT achieved 63.8% HOTA, 80.6% MOTA, and 78.5% IDF1, reaching the current state-of-the-art levels. - The computational cost is significantly lower than existing deep learning-based depth estimators, achieving a speed of 24.92 FPS on a single A100 GPU. ### Conclusion CAMOT effectively addresses the occlusion and depth direction distance estimation problems in multi-object tracking through a simple camera angle estimation method, improving tracking performance with low computational cost, making it suitable for practical applications.