Abstract:Tracking and reconstructing 3D objects from cluttered scenes are the key components for computer vision, robotics and autonomous driving systems. While recent progress in implicit function has shown encouraging results on high-quality 3D shape reconstruction, it is still very challenging to generalize to cluttered and partially observable LiDAR data. In this paper, we propose to leverage the continuity in video data. We introduce a novel and unified framework which utilizes a neural implicit function to simultaneously track and reconstruct 3D objects in the wild. Our approach adapts the DeepSDF model (i.e., an instantiation of the implicit function) in the video online, iteratively improving the shape reconstruction while in return improving the tracking, and vice versa. We experiment with both Waymo and KITTI datasets and show significant improvements over state-of-the-art methods for both tracking and shape reconstruction tasks. Our project page is at <a class="link-external link-https" href="https://jianglongye.com/implicit-tracking" rel="external noopener nofollow">this https URL</a> .

What problem does this paper attempt to address?

The problem that this paper attempts to solve is to track 3D objects and reconstruct their shapes in complex scenarios. Specifically, the author points out that although implicit functions have made remarkable progress in high - quality 3D shape reconstruction in recent years, these methods still face great challenges when dealing with cluttered and partially observable LiDAR data. For this reason, the paper proposes a novel and unified framework that uses neural implicit functions to perform 3D object tracking and reconstruction simultaneously. This framework can adapt to video data online and improve tracking performance by iteratively improving shape reconstruction, and vice versa. Experimental results show that this method significantly outperforms existing methods on both tracking and shape reconstruction tasks on the Waymo and KITTI datasets. ### Main Contributions: 1. **Proposed a new framework**: This framework can adapt online in video data and perform object tracking and shape reconstruction simultaneously, thereby enhancing the performance of both tasks. 2. **Introduced learning - based implicit functions and their shape priors**: Demonstrated their effectiveness in joint tasks. 3. **Achieved state - of - the - art performance in 3D single - object tracking tasks** while improving the performance of shape reconstruction. ### Method Overview: 1. **Shape Initialization**: Initialize the shape code using the point cloud of the initial frame. 2. **Pose Estimation**: Estimate the pose of the object by optimizing the differentiable template - matching process. 3. **Shape Adaptation**: Optimize the shape code using historical observation data to improve shape reconstruction. ### Experimental Setup: - **Pre - training**: Pre - train the DeepSDF model using the "car" category in the ShapeNet Core dataset. - **Dataset**: Experiments were carried out on the Waymo and KITTI datasets. - **Evaluation Metrics**: Including Success Rate (Success), Precision, Accuracy, Robustness, Asymmetric Chamfer Distance (ACD), and Recall. ### Experimental Results: - **Waymo Dataset**: Performs comparably to SOTracker on the easy subset and significantly outperforms SOTracker on the difficult subset. - **KITTI Dataset**: Outperforms most existing methods on both tracking and shape reconstruction tasks, especially showing a significant performance improvement compared to SOTracker. ### Ablation Study: - **Effectiveness of the Online Adaptation Mechanism**: Verified the effectiveness of the online adaptation mechanism by adapting the shape only in the first few frames of each tracklet and fixing the shape code in subsequent frames. - **Impact of Regularization, Shape Loss, and Detection Loss**: Analyzed the impact of different components on performance through ablation studies and demonstrated the importance of each component. In conclusion, this paper proposes an innovative method that simultaneously solves the problems of 3D object tracking and shape reconstruction through an online adaptation mechanism and demonstrates its superior performance on multiple datasets.

Online Adaptation for Implicit Object Tracking and Shape Reconstruction in the Wild

Exploit Spatiotemporal Contextual Information for 3D Single Object Tracking Via Memory Networks

Adaptive Resolution Optimization and Tracklet Reliability Assessment for Efficient Multi-Object Tracking

BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown Objects

MV-DeepSDF: Implicit Modeling with Multi-Sweep Point Clouds for 3D Vehicle Reconstruction in Autonomous Driving

L4D-Track: Language-to-4D Modeling Towards 6-DoF Tracking and Shape Reconstruction in 3D Point Cloud Stream

Mending Neural Implicit Modeling for 3D Vehicle Reconstruction in the Wild

Joint Monocular 3D Vehicle Detection and Tracking

Tracking Objects with 3D Representation from Videos

Online Depth Image-Based Object Tracking with Sparse Representation and Object Detection

Monocular Quasi-Dense 3D Object Tracking

On-line Object Reconstruction and Tracking for 3D Interaction.

Efficient Implicit Neural Reconstruction Using LiDAR

Online Learning 3d Context for Robust Visual Tracking

Tracking Emerges by Looking Around Static Scenes, with Neural 3D Mapping

Deep Active Contours for Real-time 6-Dof Object Tracking

Deep Reinforcement Learning With Iterative Shift For Visual Tracking

Recursive Least-Squares Estimator-Aided Online Learning for Visual Tracking

Online Parallel Framework for Real-Time Visual Tracking.

Dynamic Object Tracking for Self-Driving Cars Using Monocular Camera and LIDAR.

Robust Performance-driven 3D Face Tracking in Long Range Depth Scenes.