Learning Non-local Representation for Visual Tracking.

Peng Zhang,Zengfu Wang
DOI: https://doi.org/10.1007/978-3-030-03341-5_18
2018-01-01
Abstract:Discriminative Correlation Filter (DCF) based trackers have tremendously improved the tracking performance. They adopt the first frame of video sequence to initialize the tracker and provide a fast solution due to its formulation in the Fourier domain. Previous work that applies a DCF layer on the top of pretrianed CNN, however, has not taken full advantage of CNN feature maps. In this paper, we propose a tracking architecture to fuse the local and global response map for visual tracking in an accuracy and robust way. The feature map extracted from pretrained CNN is applied to a fully-convolutional DCF layer and a non-local layer for capturing local and global response map. Experiments show that our method achieves state-of-the-art performance on three popular benchmarks: OTB-2013, OTB-2015 and VOT2016.
What problem does this paper attempt to address?