Learning Fully Convolutional Network for Visual Tracking with Multi-Layer Feature Fusion

Yangliu Kuai,Gongjian Wen,Dongdong Li
DOI: https://doi.org/10.1109/access.2019.2899023
IF: 3.9
2019-01-01
IEEE Access
Abstract:Convolutional neural networks are powerful models that yield hierarchies of features. In the paper, we present a new approach for general object tracking based on the fully convolutional network with multi-layer feature fusion. The designed network combines semantic information from deep, coarse layers, and appearance information from shallow, fine layers to make accurate pixel-wise objectness prediction. The network is first pretrained offline using a large set of videos with annotated heatmap groundtruths to obtain a general notion of foreground objects, and later fine-tuned using the first frame to adapt to the particular object instance. In online tracking, the location corresponding to the maximum target objectness in the search image is determined as the new target location, and the scale estimation is handled by incorporating a correlation filter branch into the network. An efficient updating strategy is proposed to further improve tracking performance. The extensive experiments performed on the widely used tracking benchmark OTB100 show that the proposed algorithm outperforms many other state-of-the-art trackers.
What problem does this paper attempt to address?