Abstract:Multi-Object Tracking, also known as Multi-Target Tracking, is a significant area of computer vision that has many uses in a variety of settings. The development of deep learning, which has encouraged researchers to propose more and more work in this direction, has significantly impacted the scientific advancement around the study of tracking as well as many other domains related to computer vision. In fact, all of the solutions that are currently state-of-the-art in the literature and in the tracking industry, are built on top of deep learning methodologies that produce exceptionally good results. Deep learning is enabled thanks to the ever more powerful technology researchers can use to handle the significant computational resources demanded by these models. However, when real-time is a main requirement, developing a tracking system without being constrained by expensive hardware support with enormous computational resources is necessary to widen tracking applications in real-world contexts. To this end, a compromise is to combine powerful deep strategies with more traditional approaches to favor considerably lower processing solutions at the cost of less accurate tracking results even though suitable for real-time domains. Indeed, the present work goes in that direction, proposing a hybrid strategy for real-time multi-target tracking that combines effectively a classical optical flow algorithm with a deep learning architecture, targeted to a human-crowd tracking system exhibiting a desirable trade-off between performance in tracking precision and computational costs. The developed architecture was experimented with different settings, and yielded a MOTA of 0.608 out of the compared state-of-the-art 0.549 results, and about half the running time when introducing the optical flow phase, achieving almost the same performance in terms of accuracy.
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve
The paper aims to address the balance between performance and computational resources in Real-Time Multi-Target Tracking. Specifically, the researchers seek to develop a system that can efficiently and accurately perform multi-target tracking with limited computational resources. Existing multi-target tracking methods, while excellent in accuracy, typically require powerful hardware support, limiting their widespread use in practical applications. Therefore, this paper proposes a hybrid strategy that combines classical optical flow algorithms with deep learning architectures to achieve a good trade-off between performance and computational cost.
### Main Challenges
1. **Real-Time Requirements**: In real-time applications, the tracking system must complete processing within a limited time frame and cannot rely on expensive hardware support.
2. **Computational Resource Constraints**: In many practical application scenarios, computational resources are limited, necessitating a reduction in computational cost while ensuring tracking accuracy.
3. **Robustness in Complex Scenarios**: Multi-target tracking faces various challenges such as target scale changes, lighting condition variations, occlusions, and targets going out of bounds, all of which can affect tracking performance.
### Solutions
To address the above challenges, the paper proposes the following solutions:
1. **Hybrid Algorithm**: Combining optical flow algorithms and deep learning models, leveraging the low computational cost advantage of optical flow algorithms and the high accuracy advantage of deep learning models.
2. **Improved FairMOT Model**: Based on the FairMOT model, introducing the Byte data association method, replacing the original encoder and decoder structures with ConvNeXt Tiny and EfficientNet B3 as the new encoders, and incorporating a Feature Pyramid Network (FPN) in the decoder.
3. **Optical Flow Estimation**: Using the Pyramidal Lucas Kanade (PLK) algorithm for optical flow estimation between multiple consecutive frames to reduce computational load and improve tracking speed.
With these improvements, the proposed method achieves better results on the MOT2015 dataset, with a MOTA score of 0.608, while significantly reducing runtime, achieving nearly double the frame rate of the original method. This indicates that the method maintains high tracking accuracy while significantly enhancing real-time performance, making it suitable for resource-constrained real-time application scenarios.