Image-to-Point Registration Via Cross-Modality Correspondence Retrieval

Lin Bie,Siqi Li,Kai Cheng
DOI: https://doi.org/10.1145/3652583.3658074
2024-01-01
Abstract:Image-to-Point Cloud registration between 2D images and 3D LiDAR point clouds is a significant task in computer vision. The traditional registration pipeline first establishes correspondences between images and point clouds and then performs pose estimation based on the generated matches. However, 2D-3D correspondences are inherently difficult to be established due to the large modality gap between images and LiDAR point clouds. To this end, we build a bridge to alleviate the 2D-3D modality gap, which aligns LiDAR point clouds to the virtual points generated by images. In this way, the modality gap can be alleviated to the domain gap of different types of point clouds, i.e. original point clouds and virtual point clouds. Concretely, our framework conducts feature fusion from the LiDAR and virtual point cloud by utilizing the Transformer layer. To relieve the domain gap, a frustum points retrieval module and a combined correspondences retrieval module are proposed based on the consistency of the feature and position descriptor to select the correct correspondences among the candidates, which are generated from the simultaneous retrieval of features and position descriptors. In the implementation procedure, we design a frustum retrieval loss and a combined correspondence retrieval loss for cross-modality correspondence retrieval. Experimental results and comparison with state-of-the-art Image-to-Point Cloud methods on KITTI and nuScenes datasets demonstrate our proposed method has achieved superior performance.
What problem does this paper attempt to address?