When Correlation Filters Meet Convolutional Neural Networks for Visual Tracking

Chao Ma,Yi Xu,Bingbing Ni,Xiaokang Yang
DOI: https://doi.org/10.1109/lsp.2016.2601691
2016-01-01
IEEE Signal Processing Letters
Abstract:Correlation filters have been widely applied to visual tracking in recent years as adaptive correlation filters with short-term memory are robust to large appearance changes. However, tracking methods relying on correlation filters are prone to drifting due to noisy updates. Moreover, these methods are unable to recover from tracking failures caused by temporary or persistent heavy occlusions. In this paper, we interpret correlation filters as the counterparts of convolution filters in deep neural networks. Correlation filters encode the holistic template of target appearance, while convolution filters with smaller size encode the part-based template. In the light of this idea, we propose to exploit deep convolutional networks that directly learn mapping as a spatial correlation between two consecutive frames for visual tracking. We show that these deeply learned networks are effective in maintaining the long-term memory of target appearance for handling heavy occlusion or out-of-view. We further take the response maps both from the deep networks and conventional correlation filters into account for precisely locating the target. Experimental results on large-scale benchmark sequences show that the proposed algorithm performs favorably against the state-of-the-art methods.
What problem does this paper attempt to address?