Region-based High-resolution Siamese Network for Robust Visual Tracking

Chunbao Li,Bo Yang
DOI: https://doi.org/10.1145/3354031.3354051
2019-01-01
Abstract:Visual tracking is an active and challenging research topic in computer vision, as objects often undergo significant appearance variations caused by occlusion, deformation and background clutter. In recent years, many convolutional neural network based trackers have achieved impressive performance by integrating multi-layer features. However, in order to conduct multi-scale feature fusion, most of these trackers recover high-resolution presentations from low-resolution representations produced by a high-to-low resolution network, which tend to result in inaccurate feature maps or lose of details of the target object. In this paper, we propose an end-to-end region-based high-resolution fully convolutional Siamese network for tracking. In the tracker, we propose to extract the spatial information and semantic information of the target object using a high-resolution network that maintains rich high-resolution representations of the target object through the whole process. Furthermore, a set of position-sensitive score maps are obtained for all regions of the target template, and an adaptive weighting method is proposed to fuse score maps of multiple regions. Experimental results on the OTB50 and OTB100 benchmark datasets demonstrate that our tracker performs better than several state-of-the-art trackers while running in real-time.
What problem does this paper attempt to address?