Deep Convolutional Correlation Filter Learning Toward Robust Visual Object Tracking

Tayssir Bouraffa,Zihang Feng,Yuxuan Wang,Liping Yan,Yuanqing Xia,Bo Xiao
DOI: https://doi.org/10.1109/ccdc55256.2022.10034306
2022-01-01
Abstract:Recently, convolutional neural network has been pervasively adopted in visual object tracking for its potential in discriminating the target from the surrounding background. Most of the visual object trackers extract deep features from a specific layer, generally from the last convolutional layer. However, these trackers are less effective, especially when the target undergoes drastic appearance variations caused by the presence of different challenging situations, such as occlusion, illumination change, background clutter and so on. In this research paper, a novel tracking algorithm is developed by introducing an elastic net constraint and a contextual information into the convolutional network to successfully track the desired target throughout a video sequence. Hierarchical features are extracted from the shallow and the deep convolutional layers to further improve the tracking accuracy and robustness. As the deep convolutional layers capture important semantic information, they are more robust to the target appearance variations. As for the shallow convolutional layers, they encode significant spatial details, which are more accurate to precisely localize the desired target. Moreover, Peak–Strength Context–Aware correlation filters are embedded to each convolutional layer output that produce multi–level convolutional response maps to collaboratively identify the estimated position of the target in a coarse–to–fine manner. Quantitative and qualitative experiments are performed on the widely used benchmark, the OTB–2015 dataset that shows impressive results compared to the state–of–the–art trackers.
What problem does this paper attempt to address?