Online Adaptation for Implicit Object Tracking and Shape Reconstruction in the Wild

Jianglong Ye,Yuntao Chen,Naiyan Wang,Xiaolong Wang
DOI: https://doi.org/10.1109/LRA.2022.3189185
2022-07-05
Abstract:Tracking and reconstructing 3D objects from cluttered scenes are the key components for computer vision, robotics and autonomous driving systems. While recent progress in implicit function has shown encouraging results on high-quality 3D shape reconstruction, it is still very challenging to generalize to cluttered and partially observable LiDAR data. In this paper, we propose to leverage the continuity in video data. We introduce a novel and unified framework which utilizes a neural implicit function to simultaneously track and reconstruct 3D objects in the wild. Our approach adapts the DeepSDF model (i.e., an instantiation of the implicit function) in the video online, iteratively improving the shape reconstruction while in return improving the tracking, and vice versa. We experiment with both Waymo and KITTI datasets and show significant improvements over state-of-the-art methods for both tracking and shape reconstruction tasks. Our project page is at <a class="link-external link-https" href="https://jianglongye.com/implicit-tracking" rel="external noopener nofollow">this https URL</a> .
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to track 3D objects and reconstruct their shapes in complex scenarios. Specifically, the author points out that although implicit functions have made remarkable progress in high - quality 3D shape reconstruction in recent years, these methods still face great challenges when dealing with cluttered and partially observable LiDAR data. For this reason, the paper proposes a novel and unified framework that uses neural implicit functions to perform 3D object tracking and reconstruction simultaneously. This framework can adapt to video data online and improve tracking performance by iteratively improving shape reconstruction, and vice versa. Experimental results show that this method significantly outperforms existing methods on both tracking and shape reconstruction tasks on the Waymo and KITTI datasets. ### Main Contributions: 1. **Proposed a new framework**: This framework can adapt online in video data and perform object tracking and shape reconstruction simultaneously, thereby enhancing the performance of both tasks. 2. **Introduced learning - based implicit functions and their shape priors**: Demonstrated their effectiveness in joint tasks. 3. **Achieved state - of - the - art performance in 3D single - object tracking tasks** while improving the performance of shape reconstruction. ### Method Overview: 1. **Shape Initialization**: Initialize the shape code using the point cloud of the initial frame. 2. **Pose Estimation**: Estimate the pose of the object by optimizing the differentiable template - matching process. 3. **Shape Adaptation**: Optimize the shape code using historical observation data to improve shape reconstruction. ### Experimental Setup: - **Pre - training**: Pre - train the DeepSDF model using the "car" category in the ShapeNet Core dataset. - **Dataset**: Experiments were carried out on the Waymo and KITTI datasets. - **Evaluation Metrics**: Including Success Rate (Success), Precision, Accuracy, Robustness, Asymmetric Chamfer Distance (ACD), and Recall. ### Experimental Results: - **Waymo Dataset**: Performs comparably to SOTracker on the easy subset and significantly outperforms SOTracker on the difficult subset. - **KITTI Dataset**: Outperforms most existing methods on both tracking and shape reconstruction tasks, especially showing a significant performance improvement compared to SOTracker. ### Ablation Study: - **Effectiveness of the Online Adaptation Mechanism**: Verified the effectiveness of the online adaptation mechanism by adapting the shape only in the first few frames of each tracklet and fixing the shape code in subsequent frames. - **Impact of Regularization, Shape Loss, and Detection Loss**: Analyzed the impact of different components on performance through ablation studies and demonstrated the importance of each component. In conclusion, this paper proposes an innovative method that simultaneously solves the problems of 3D object tracking and shape reconstruction through an online adaptation mechanism and demonstrates its superior performance on multiple datasets.