Memory Network with Pixel-level Spatio-Temporal Learning for Visual Object Tracking
Zechu Zhou,Xinyu Zhou,Zhaoyu Chen,Pinxue Guo,Qian-Yu Liu,Wenqiang Zhang
DOI: https://doi.org/10.1109/tcsvt.2023.3272319
IF: 5.859
2023-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Making full use of temporal and spatial information is critical to cope with the appearance changes of objects in visual object tracking. However, existing methods in the tracking field, which employ a memory network at frame level to learn this information, bring redundancy and cannot build long-term relationships among historical frames due to the limited memory size. In this paper, we propose a novel memory network, Pixel-level Spatio-Temporal Memory (PSTM), which organizes object features in an efficient way to leverage temporal and spatial context information. Specifically, PSTM is constructed and updated by a memory writer, which includes a pixel-level updating strategy to maintain the temporal consistency and dynamically memorize the noteworthy variations. Furthermore, in order to exploit relationships between the object and search region and precisely estimate the state of the object, we propose a memory reader, Pixel-wise Matching and Refinement module (PMR), and model spatial context without a complex manual-designed mechanism. Comprehensive experiments and comparisons on challenging large-scale benchmarks, including GOT-10k, TrackingNet, LaSOT, OTB2015, VOT2020, and NfS, have demonstrated the effectiveness of our proposed method, which performs favorably against state-of-the-art trackers.
engineering, electrical & electronic