Abstract:Point cloud registration is an essential technology in computer vision and robotics. Recently, transformer-based methods have achieved advanced performance in point cloud registration by utilizing the advantages of the transformer in order-invariance and modeling dependencies to aggregate information. However, they still suffer from indistinct feature extraction, sensitivity to noise, and outliers, owing to three major limitations: 1) the adoption of CNNs fails to model global relations due to their local receptive fields, resulting in extracted features susceptible to noise; 2) the shallow-wide architecture of transformers and the lack of positional information lead to indistinct feature extraction due to inefficient information interaction; and 3) the insufficient consideration of geometrical compatibility leads to the ambiguous identification of incorrect correspondences. To address the above-mentioned limitations, a novel full transformer network for point cloud registration is proposed, named the deep interaction transformer (DIT), which incorporates: 1) a point cloud structure extractor (PSE) to retrieve structural information and model global relations with the local feature integrator (LFI) and transformer encoders; 2) a deep-narrow point feature transformer (PFT) to facilitate deep information interaction across a pair of point clouds with positional information, such that transformers establish comprehensive associations and directly learn the relative position between points; and 3) a geometric matching-based correspondence confidence evaluation (GMCCE) method to measure spatial consistency and estimate correspondence confidence by the designed triangulated descriptor. Extensive experiments on the ModelNet40, ScanObjectNN, and 3DMatch datasets demonstrate that our method is capable of precisely aligning point clouds, consequently, achieving superior performance compared with state-of-the-art methods. The code is publicly available at https://github.com/CGuangyan-BIT/DIT.

TransLO: A Window-Based Masked Point Transformer Framework for Large-Scale LiDAR Odometry

ELiOT : End-to-end Lidar Odometry using Transformer Framework

OverlapTransformer: An Efficient and Rotation-Invariant Transformer Network for LiDAR-Based Place Recognition

Cross transformer for LiDAR-based loop closure detection

OcTr: Octree-based Transformer for 3D Object Detection

Efficient 3D Deep LiDAR Odometry

Full Transformer Framework for Robust Point Cloud Registration With Deep Information Interaction

OverlapTransformer: An Efficient and Yaw-Angle-Invariant Transformer Network for LiDAR-Based Place Recognition

TRLO: An Efficient LiDAR Odometry with 3D Dynamic Object Tracking and Removal

PointTr: Low-Overlap Point Cloud Registration with Transformer

Low-Overlap Point Cloud Registration With Transformer

TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers

Multimodal Transformer for Automatic 3D Annotation and Object Detection

CVTNet: A Cross-View Transformer Network for Place Recognition Using LiDAR Data

TransMRE: Multiple Observation Planes Representation Encoding With Fully Sparse Voxel Transformers for 3-D Object Detection

Exploring Point-BEV Fusion for 3D Point Cloud Object Tracking with Transformer

RegFormer: An Efficient Projection-Aware Transformer Network for Large-Scale Point Cloud Registration

PWCLO-Net: Deep LiDAR Odometry in 3D Point Clouds Using Hierarchical Embedding Mask Optimization

Outdoor large-scene 3D point cloud reconstruction based on transformer

LiDARFormer: A Unified Transformer-based Multi-task Network for LiDAR Perception

Deep Interactive Full Transformer Framework for Point Cloud Registration.