Discriminative Correspondence Estimation for Unsupervised RGB-D Point Cloud Registration
Chenbo Yan,Mingtao Feng,Zijie Wu,Yulan Guo,Weisheng Dong,Yaonan Wang,Ajmal Mian
DOI: https://doi.org/10.1109/tcsvt.2024.3480268
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Point cloud registration is a fundamental task for estimating the rigid transformation matrix between two point clouds, and is regarded as a prerequisite for downstream vision tasks. Recent works have sought to address the registration problem using the obtainable RGB-D sequence, rather than relying solely on point clouds, which may not always be available. However, most existing unsupervised RGB-D point cloud registration works struggle to obtain fine-grained, robust, discriminative correspondences due to the simple concatenation of multimodal features and the increase in vector dimensions. These methods typically follow a common paradigm: extracting features from the input data, estimating correspondences, and obtaining the transformation matrix through geometric fitting. In this work, we design a generative feature extraction module to fully leverage multimodal information, and seek a novel perspective for correspondence estimation which expands the points in the source and target point clouds into hyperrectangle-based embeddings and considers their inner relationships, based on intersections in n-dimensional space, as the basis for estimating correspondences. Each hyperrectangle-based embedding is built upon the natural and discriminative semantics from the proposed generative feature extraction module, which involves a diffusion branch, a geometric branch, and point-pixel fusion. We harness the capability of the generative model to fully leverage the information from both complementary modalities in RGB-D frames. Furthermore, this distinctive geometry space allows for efficient calculation of intersection volumes and model conditional probabilistics for estimating correspondences. Extensive experiments on the 3DMatch and ScanNet datasets show the effectiveness of the proposed method in this challenging task, outperforming state-of-the-art approaches. Our code will be released at: https://github.com/cbyan1003/DCE.