Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks
Jongwon Ra,MengMeng Wang,Jianbiao Mei,Shanqi Liu,Yu Yang,Yong Liu
DOI: https://doi.org/10.1109/3dv62453.2024.00050
2024-01-01
Abstract:The point cloud-based 3D single object tracking plays an indispensable role in autonomous driving. However, the application of 3D object tracking in the real world is still challenging due to the inherent sparsity and self-occlusion of point cloud data. Therefore, it is necessary to exploit as much useful information from limited data as we can. Since 3D object tracking is a video-level task, the appearance of objects changes gradually over time, and there is rich spatiotemporal contextual information among historical frames. However, existing methods do not fully utilize this information. To address this, we propose a new method called SCTrack, which utilizes a memory-based paradigm to exploit spatiotemporal contextual information. SCTrack incorporates both long-term and short-term memory banks to store the spatiotemporal features of targets from historical frames. By doing so, the tracker can benefit from the entire video sequence and make more informed predictions. Additionally, SCTrack extracts the mask prior to augmenting the target representation, improving the target-background discriminability. Extensive experiments on KITTI, nuScenes, and Waymo Open datasets verify the effectiveness of our proposed method.