Abstract:3D single object tracking (SOT) is a crucial task in fields of mobile robotics and autonomous driving. Traditional motion-based approaches achieve target tracking by estimating the relative movement of target between two consecutive frames. However, they usually overlook local motion information of the target and fail to exploit historical frame information effectively. To overcome the above limitations, we propose a point-level flow method with multi-frame information for 3D SOT task, called FlowTrack. Specifically, by estimating the flow for each point in the target, our method could capture the local motion details of target, thereby improving the tracking performance. At the same time, to handle scenes with sparse points, we present a learnable target feature as the bridge to efficiently integrate target information from past frames. Moreover, we design a novel Instance Flow Head to transform dense point-level flow into instance-level motion, effectively aggregating local motion information to obtain global target motion. Finally, our method achieves competitive performance with improvements of 5.9% on the KITTI dataset and 2.9% on NuScenes. The code will be made publicly available soon.

What problem does this paper attempt to address?

### Problems Addressed by the Paper This paper aims to address several key issues in 3D Single Object Tracking (3D SOT): 1. **Neglect of Local Motion Information**: Traditional motion-based methods usually focus only on the relative motion of the target between two consecutive frames, ignoring the local motion details of the target. This neglect leads to a decline in tracking performance, especially in cases where the target is occluded or there are similar distractors. 2. **Insufficient Utilization of Historical Frame Information**: Existing methods often only use short-term motion cues between two frames and fail to effectively utilize geometric and motion information from historical frames. This limits the tracking performance in sparse point cloud scenarios. 3. **Limitations of Matching Methods**: Matching-based methods (such as Siamese networks) face challenges when dealing with sparse point clouds, target occlusion, and long-distance target tracking, as these methods rely on appearance matching, and the sparsity and disorder of point clouds make appearance matching difficult. To overcome the above issues, the authors propose a multi-frame point-level flow network (FlowTrack), which captures the local motion details of the target by estimating the flow of each point and utilizes historical frame information to improve tracking performance. Specifically, FlowTrack includes the following modules: - **Historical Information Fusion Module (HIM)**: Efficiently integrates target information from historical frames into the template frame through learnable target features, supplementing the target's geometric and motion information. - **Point-level Motion Module (PMM)**: Extracts multi-scale point-level flow features to capture the local motion details of the target. - **Instance Flow Head (IFH)**: Converts point-level flow into instance-level motion, achieving precise relative motion estimation of the target. Through these innovations, FlowTrack achieves significant performance improvements on the KITTI and NuScenes datasets, with increases of 5.9% and 2.9%, respectively.

FlowTrack: Point-level Flow Network for 3D Single Object Tracking

FlowMOT: 3D Multi-Object Tracking by Scene Flow Association

Flow-Guided Single Object Tracking Framework in UAV Aerial Video

HyGFNet: Hybrid Geometry-Flow Learning Network for 3D Single Object Tracking

Recurrent Graph Optimal Transport for Learning 3D Flow Motion in Particle Tracking

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

DetFlowTrack: 3D Multi-object Tracking Based on Simultaneous Optimization of Object Detection and Scene Flow Estimation

EasyTrack: Efficient and Compact One-stream 3D Point Clouds Tracker

STTracker: Spatio-Temporal Tracker for 3D Single Object Tracking

Temporal Point Cloud Fusion With Scene Flow for Robust 3D Object Tracking

TrackFlow: Multi-Object Tracking with Normalizing Flows

OST: Efficient One-stream Network for 3D Single Object Tracking in Point Clouds

SeqTrack3D: Exploring Sequence Information for Robust 3D Point Cloud Tracking

VoxelTrack: Exploring Voxel Representation for 3D Point Cloud Object Tracking

PointTrackNet: An End-to-End Network For 3-D Object Detection and Tracking From Point Clouds

VoxelTrack: Exploring Multi-level Voxel Representation for 3D Point Cloud Object Tracking

Multiple Object Tracking by Reliable Tracklets

Beyond 3D Siamese Tracking: A Motion-Centric Paradigm for 3D Single Object Tracking in Point Clouds

3D Target Detection and Tracking Based on Scene Flow