Robust Visual Tracking with Deep Feature Fusion

Guokun Wang,Jingjing Wang,Wenyi Tang,Nenghai Yu
DOI: https://doi.org/10.1109/icassp.2017.7952490
2017-01-01
Abstract:Recently, CNN (Convolutional Neural Network) based trackers have achieved promising results benefited from their robust feature representation. However, most trackers only use features from a certain layer, which limits their performance. In this paper, we propose a novel CNN based tracker. Firstly, we use local detection and global detection network for target localization. In local detection network, we fuse features from different layers to train a fully convolutional neural network for target localization. In case the local detection network fails when the target disappear for a while and appears in another location, we train a global detection network to detect if the target appears again. Then, we employ a correlation filter to estimate accurate scale of the target using HOG features extracted around predicted location. Extensive experiments on various challenging video sequences demonstrate the effectiveness of our proposed algorithm compared with several state-of-the-art trackers.
What problem does this paper attempt to address?