SCATT: Transformer tracking with symmetric cross-attention

Jianming Zhang,Wentao Chen,Jiangxin Dai,Jin Zhang
DOI: https://doi.org/10.1007/s10489-024-05467-1
IF: 5.3
2024-05-08
Applied Intelligence
Abstract:In the popular Siamese network tracker, cross-correlation is based on the similarity to find the exact location of the template in the search region. However, due to cross-correlation primarily focuses on the spatial neighborhoods, so it often falls into local optimum. Additionally, multiple fusions of features results in a degrade of the target position information. To address these issues, we purpose a novel transformer-variant tracker. We make cross-attention play a central role in our tracker, and thus propose a novel symmetric cross-attention that effectively fuses the features of the template and the search region. The symmetric cross-attention only uses the cross-attention mechanism so as to get rid of the cross-correlation operation, which avoids local optimum and captures more global information. We also propose a position information enhancement module preserving more horizontal and vertical position information, which avoids the loss of position information caused by multiple fusions of features and helps the tracker to locate the target more accurately. Our proposed tracker achieves state-of-the-art performance on six benchmarks including GOT-10k, TrackingNet, LaSOT, UAV123, OTB100, and VOT2020, and is able to run at real-time speed.
computer science, artificial intelligence
What problem does this paper attempt to address?