Motion-Driven Tracking via End-to-End Coarse-to-Fine Verifying

Rui Wang,Bineng Zhong,Yan Chen
DOI: https://doi.org/10.1109/tcsvt.2023.3289620
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Target appearance and motion variations are the primary challenges in visual tracking. To tackle these challenges, top-performing trackers commonly rely on constructing complex appearance or motion models. However, the efficacy of these models in enhancing track performance can be limited by the lack of effective and seamless integration. The utilization of simplistic handcrafted fusion methods may even exacerbate the issue, resulting in a decline in tracking performance. To address this issue, we propose an end-to-end coarse-to-fine verifying approach in our motion-driven tracker. At the coarse level, we developed a motion prediction module (MPM) that efficiently extracts and utilizes motion information by leveraging the differences between adjacent frames. The MPM constructs not only a position prior for the decoder but also hybrid features that combine both motion and appearance. At the fine level, we employ a deformable transformer-based appearance model to accurately verify a local region centered on the predicted locations from the MPM. To further enhance the generalization capability of our tracker, we propose the use of an instance domain discriminator (IDD) during the training phase. This discriminator is based on domain adaptation theory and aims to sharpen the distinction between the target and other instances, thereby improving the robustness of tracking. Experimental results on five popular benchmarks, including GOT10k, LaSOT, TrackingNet, OTB, and VOT, validate the effectiveness of our proposed tracker.
engineering, electrical & electronic
What problem does this paper attempt to address?