Robust Visual Tracking Method via Deep Learning

Jun-Yu GAO,Xiao-Shan YANG,Tian-Zhu ZHANG,Chang-Sheng XU
DOI: https://doi.org/10.11897/SP.J.1016.2016.01419
2016-01-01
Abstract:The traditional tracking methods (e.g.L1 tracker)generally adopt the pixel values as feature representation,and ignore the deep visual features of image patches.In a fixed video scene of the real world,we realize that we can usually find an area where the targets have clear appearance and are easy to distinguish.Therefore,in this paper,we select a region in each video to construct training set for deep model learning.In the proposed deep model,we design a deep convolutional neural network which has two symmetrical paths with the shared weights.The goal of the proposed deep network is to reduce the difference between the features of a target out of the region and in the region.As a result,the learned deep network can enhance the appearance feature of targets and benefit the trackers that utilize low-level features,such as L1 tracker.Finally, we utilize this pre-trained deep convolutional network in the L1 tracker to extract features for sparse representation.Consequently,our method achieves the robustness in tracking for handling the challenges such as occlusion and illumination changes.We evaluate the proposed approach on 25 challenging videos against with 9 state-of-the-art trackers.The extensive results show that the proposed algorithm is 0.11 higher than the second best with average overlap,and is 1.0 lower than the second best with the average center location errors.
What problem does this paper attempt to address?