FlowTrack: Point-level Flow Network for 3D Single Object Tracking

Shuo Li,Yubo Cui,Zhiheng Li,Zheng Fang
2024-07-02
Abstract:3D single object tracking (SOT) is a crucial task in fields of mobile robotics and autonomous driving. Traditional motion-based approaches achieve target tracking by estimating the relative movement of target between two consecutive frames. However, they usually overlook local motion information of the target and fail to exploit historical frame information effectively. To overcome the above limitations, we propose a point-level flow method with multi-frame information for 3D SOT task, called FlowTrack. Specifically, by estimating the flow for each point in the target, our method could capture the local motion details of target, thereby improving the tracking performance. At the same time, to handle scenes with sparse points, we present a learnable target feature as the bridge to efficiently integrate target information from past frames. Moreover, we design a novel Instance Flow Head to transform dense point-level flow into instance-level motion, effectively aggregating local motion information to obtain global target motion. Finally, our method achieves competitive performance with improvements of 5.9% on the KITTI dataset and 2.9% on NuScenes. The code will be made publicly available soon.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address several key issues in 3D Single Object Tracking (3D SOT): 1. **Neglect of Local Motion Information**: Traditional motion-based methods usually focus only on the relative motion of the target between two consecutive frames, ignoring the local motion details of the target. This neglect leads to a decline in tracking performance, especially in cases where the target is occluded or there are similar distractors. 2. **Insufficient Utilization of Historical Frame Information**: Existing methods often only use short-term motion cues between two frames and fail to effectively utilize geometric and motion information from historical frames. This limits the tracking performance in sparse point cloud scenarios. 3. **Limitations of Matching Methods**: Matching-based methods (such as Siamese networks) face challenges when dealing with sparse point clouds, target occlusion, and long-distance target tracking, as these methods rely on appearance matching, and the sparsity and disorder of point clouds make appearance matching difficult. To overcome the above issues, the authors propose a multi-frame point-level flow network (FlowTrack), which captures the local motion details of the target by estimating the flow of each point and utilizes historical frame information to improve tracking performance. Specifically, FlowTrack includes the following modules: - **Historical Information Fusion Module (HIM)**: Efficiently integrates target information from historical frames into the template frame through learnable target features, supplementing the target's geometric and motion information. - **Point-level Motion Module (PMM)**: Extracts multi-scale point-level flow features to capture the local motion details of the target. - **Instance Flow Head (IFH)**: Converts point-level flow into instance-level motion, achieving precise relative motion estimation of the target. Through these innovations, FlowTrack achieves significant performance improvements on the KITTI and NuScenes datasets, with increases of 5.9% and 2.9%, respectively.