Context Matching-Guided Motion Modeling for 3D Point Cloud Object Tracking
Jiahao Nie,Anqi Xu,Zhengyi Bao,Zhiwei He,Xudong Lv,Mingyu Gao
DOI: https://doi.org/10.1109/tcsvt.2024.3498853
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:LiDAR-based single object tracking plays a key role in intelligent vehicles. Current methods typically follow appearance matching or motion-centric frameworks. However, point clouds are usually sparse and incomplete, providing insufficient appearance information for matching. While the motion-centric framework predicts inter-frame motion of targets instead of performing appearance matching for tracking, it neglects contextual information matching of consecutive frames that is conducive to target motion modeling. In this paper, we propose an elegant and effective framework by leveraging Context Matching to guide motion modeling for accurate Tracking (CMTrack). The novel framework possesses two attractive properties: 1) It incorporates a context matching encoder-decoder network to match contextual information of consecutive frames, fully exploring informative cues relevant to target motion. 2) Benefiting from informative motion cues being modeling, CMTrack allows for accurate prediction of inter-frame motion of targets in a one-stage manner. Extensive experiments are conducted on several widely-adopted datasets, i.e. , KITTI, NuScenes and Waymo Open Dataset. Without bells and whistles, our CMTrack demonstrates competitive tracking accuracy ( e.g. , 87.3% and 69.3% precision on KITTI and NuScenes, respectively) compared to state-of-the-art methods, while running at a high speed of 48 Fps on a single Titan Xp GPU.