Joint Feature Correspondences and Appearance Similarity for Robust Visual Object Tracking
Zulfiqar Hasan Khan,Irene Yu-Hua Gu
DOI: https://doi.org/10.1109/tifs.2010.2050312
IF: 7.231
2010-09-01
IEEE Transactions on Information Forensics and Security
Abstract:A novel visual object tracking scheme is proposed by using joint point feature correspondences and object appearance similarity. For point feature-based tracking, we propose a candidate tracker that simultaneously exploits two separate sets of point feature correspondences in the foreground and in the surrounding background, where background features are exploited for the indication of occlusions. Feature points in these two sets are then dynamically maintained. For object appearance-based tracking, we propose a candidate tracker based on an enhanced anisotropic mean shift with a fully tunable (five degrees of freedom) bounding box that is partially guided by the above feature point tracker. Both candidate trackers contain a reinitialization process to reset the tracker in order to prevent accumulated tracking error propagation in frames. In addition, a novel online learning method is introduced to the enhanced mean shift-based candidate tracker. The reference object distribution is updated in each time interval if there is an indication of stable and reliable tracking without background interferences. By dynamically updating the reference object model, tracking is further improved by using a more accurate object appearance similarity measure. An optimal selection criterion is applied to the final tracker based on the results of these candidate trackers. Experiments have been conducted on several videos containing a range of complex scenarios. To evaluate the performance, the proposed scheme is further evaluated using three objective criteria, and compared with two existing trackers. All our results have shown that the proposed scheme is very robust and has yielded a marked improvement in terms of tracking drift, tightness, and accuracy of tracked bounding boxes, especially for complex video scenarios containing long-term partial occlusions or intersections, deformation, or background clutter with similar color distributions to the foreground object.
computer science, theory & methods,engineering, electrical & electronic