MLGT: multi-local guided tracker for visual object tracking
Xingzhu Liang,Miaomiao Chen,Erhu Liu
DOI: https://doi.org/10.1007/s11554-024-01418-8
IF: 2.293
2024-03-21
Journal of Real-Time Image Processing
Abstract:Existing single-stream tracking pipelines achieve good performance improvements by joint feature extraction and interaction. These tracking pipelines establish a bidirectional information flow between the template frame and the search frame, using the correlation and dynamic changes between them to improve the modeling and representation capabilities of the object, thereby improving the accuracy and robustness of tracking. However, these tracking pipelines just use the highest level semantic information of the encoder, and the low-level features are only used to compute new activations, which cannot meet the fine-grained requirements of the tracking task. To solve this issue, we propose a new approach named multi-local guided tracker (MLGT), which merges features obtained at various depths to strengthen the interaction between different semantic information. Specifically, we divide the single-stream pipeline into fixed output stages, and each stage is responsible for extracting and processing different levels of features. Then, we pass the output features into an enhanced fusion module (EFM), which incorporates a shared encoder and concatenation operation. The encoder is used to further extract the information in the joint features, and the catenation operation used to fuse features from different output stages. We conduct extensive evaluations on five datasets, among which we achieve 70.5% SUC on the LaSOT dataset, which is 1.4% higher than the existing single-stream tracker OSTrack.
computer science, artificial intelligence,engineering, electrical & electronic,imaging science & photographic technology