MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation

Jonas Serych,Michal Neoral,Jiri Matas
2024-11-15
Abstract:In this work, we present MFTIQ, a novel dense long-term tracking model that advances the Multi-Flow Tracker (MFT) framework to address challenges in point-level visual tracking in video sequences. MFTIQ builds upon the flow-chaining concepts of MFT, integrating an Independent Quality (IQ) module that separates correspondence quality estimation from optical flow computations. This decoupling significantly enhances the accuracy and flexibility of the tracking process, allowing MFTIQ to maintain reliable trajectory predictions even in scenarios of prolonged occlusions and complex dynamics. Designed to be "plug-and-play", MFTIQ can be employed with any off-the-shelf optical flow method without the need for fine-tuning or architectural modifications. Experimental validations on the TAP-Vid Davis dataset show that MFTIQ with RoMa optical flow not only surpasses MFT but also performs comparably to state-of-the-art trackers while having substantially faster processing speed. Code and models available at <a class="link-external link-https" href="https://github.com/serycjon/MFTIQ" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve The paper attempts to address the challenges of point-level visual tracking in video sequences. Specifically, the authors propose a new model named MFTIQ, which aims to improve the Multi-Flow Tracker (MFT) framework to tackle several key issues in point-level visual tracking: 1. **Accuracy of Long-term Tracking**: Traditional optical flow chaining methods are prone to trajectory drift and error accumulation during long-term tracking, especially when objects are occluded, leading to a significant decline in tracking performance. 2. **Flexibility and Generality**: Existing multi-flow trackers are often tightly coupled with specific optical flow methods, limiting their flexibility and generality. MFTIQ is designed to be "plug-and-play," allowing it to be used with any off-the-shelf optical flow method without the need for fine-tuning or architectural modifications. 3. **Ability to Handle Complex Dynamic Scenes**: In complex dynamic scenes, such as long-term occlusions and rapid movements, traditional methods struggle to maintain reliable trajectory predictions. MFTIQ improves tracking performance in these scenarios by introducing an Independent Quality (IQ) module that separates correspondence quality estimation from optical flow computation. ### Solution The main innovations of MFTIQ include: 1. **Independent Quality (IQ) Module**: MFTIQ introduces an independent quality module that separates correspondence quality estimation from optical flow computation. This not only improves tracking accuracy and flexibility but also allows for direct estimation of the quality of the optical flow chain between the template frame and the current frame without relying on error accumulation. 2. **Plug-and-Play Functionality**: MFTIQ is designed to be used with any optical flow method after a single training session, without the need for retraining or fine-tuning. Users can choose the appropriate speed/performance trade-off as needed, enhancing the flexibility and adaptability of the tracking system. 3. **Experimental Validation**: Experimental results show that MFTIQ outperforms MFT on the TAP-Vid Davis dataset and significantly speeds up processing compared to other state-of-the-art trackers when handling dense correspondences. ### Experimental Results - **Positional Accuracy**: MFTIQ achieved the best positional accuracy on the DAVIS dataset and the second-best positional accuracy on the ROBOTAP and KINETICS datasets. - **Occlusion Accuracy**: While MFTIQ showed improvements in occlusion accuracy, it still lagged behind some of the latest sparse point trackers. - **Processing Speed**: For dense tracking tasks, MFTIQ's inference time was significantly faster than other methods with similar accuracy. ### Conclusion By introducing an independent quality module and a plug-and-play design, MFTIQ significantly enhances the performance and flexibility of long-term, dense point-level visual tracking. Experimental results validate the effectiveness of this approach and demonstrate its advantages in handling complex dynamic scenes.