Target-Aware State Estimation for Visual Tracking

Zikun Zhou,Xin Li,Nana Fan,Hongpeng Wang,Zhenyu He
DOI: https://doi.org/10.1109/tcsvt.2021.3103063
IF: 5.859
2021-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Trackers based on the IoU prediction network (IoU-Net) have shown superior performance, which refines a coarse bounding box to an accurate one by maximizing the IoU between the target and the coarse box. However, the traditional IoU-Net is less effective in exploiting the limited but crucial supervision information contained in the initial frame, including the discriminative information between the target and backgrounds and the structure information of the initial target. Missing such information makes the IoU-Net less robust to background distractors and diverse variations of the target appearance. To address this issue, we propose a target-aware state estimation network for visual tracking. A gradient-guided feature adjustment module is built on an online discriminative model to generate target-aware features for constructing the state estimation network; it conveys the online learned discriminative information into the offline trained state estimation network. In addition, we propose a structure-aware integration module and embed it into the state estimation network, enabling the tracker to explicitly model the structure information of the initial target. Extensive experimental results on the VOT2018, OTB2015, UAV123, NFS30, TC128, TrackingNet, LaSOT, and VOT2018-LT datasets demonstrate that the proposed approach performs favorably against state-of-the-art trackers.
engineering, electrical & electronic
What problem does this paper attempt to address?