Triple Attention and Global Reasoning Siamese Networks for Visual Tracking
Shu Ping,Xu Keying,Bao Hua
DOI: https://doi.org/10.1007/s00138-022-01301-1
IF: 2.983
2022-01-01
Machine Vision and Applications
Abstract:As a fundamental problem in computer vision, the aim of object tracking is to capture the accurate information of the given target in the video sequence, with the initial information determined in the first frame. Despite its significant improvement in the past decades, however, they are still facing various challenges, including occlusion, deformation, fast motion, etc. To attain robust performance, a tracking algorithm based on triple attention mechanism and global reasoning model is presented in this work, which is inspired by the progress of the Siamese network recently. First, in order to solve the problem of insufficient feature extraction, a triple attention model is proposed, which consists of three parts: squeeze-and-excitation (SE) block, spatial SE (sSE) block, and channel SE (cSE) block. Second, to tackle the lack of context information in the tracking procedure, a global reasoning model was added into the template branch and search branch, which will generate two different score maps. As the tracking process continued, these two score maps were summed to construct a regression confidence map with their weight, respectively. Extensive experiments on exited benchmarks including OTB50, OTB100, VOT 2016, VOT2018, GOT-10k, LaSOT, NFS, and TC128 demonstrate that the proposed method achieves competitive results.