Abstract:LiDAR‐based 3D place recognition is an essential component of simultaneous localization and mapping systems in multi‐scene robotic applications. However, extracting discriminative and generalizable global descriptors of point clouds is still an open issue due to the insufficient use of the information contained in the LiDAR scans in existing approaches. In this paper, we propose a novel spatial‐temporal point cloud encoding network for multiple scenes, dubbed STM‐Net, to fully fuse the multi‐view spatial information and temporal information of LiDAR point clouds. Specifically, we first develop a spatial feature encoding module consisting of the single‐view transformer and multi‐view transformer. The module learns the correlation both within a single view and between two views by utilizing the multi‐layer range images generated by spherical projection and multi‐layer bird's eye view images generated by top‐down projection. Then in the temporal feature encoding module, we exploit the temporal transformer to mine the temporal information in the sequential point clouds, and a NetVLAD layer is applied to aggregate features and generate sub‐descriptors. Furthermore, we use a GeM pooling layer to fuse more information along the time dimension for the final global descriptors. Extensive experiments conducted on unmanned ground/surface vehicles with different LiDAR configurations indicate that our method (1) achieves superior place recognition performance than state‐of‐the‐art algorithms, (2) generalizes well to diverse sceneries, (3) is robust to viewpoint changes, (4) can operate in real‐time, demonstrating the effectiveness and satisfactory capability of the proposed approach and highlighting its promising applications in multi‐scene place recognition tasks.

C2L-PR: Cross-modal Camera-to-LiDAR Place Recognition Via Modality Alignment and Orientation Voting

Persistent Stereo Visual Localization on Cross-Modal Invariant Map

Multimodal Localization: Stereo over LiDAR Map

LocNet: Global Localization in 3D Point Clouds for Mobile Robots.

3D LiDAR-Based Global Localization Using Siamese Neural Network

Laser Map Aided Visual Inertial Localization in Changing Environment.

RINet: Efficient 3D Lidar-Based Place Recognition Using Rotation Invariant Neural Network

A Novel Place Recognition Network Using Visual Sequences and LiDAR Point Clouds for Autonomous Vehicles

Monocular Visual Place Recognition in LiDAR Maps via Cross-Modal State Space Model and Multi-View Matching

LCPR: A Multi-Scale Attention-Based LiDAR-Camera Fusion Network for Place Recognition

Camera-LiDAR Fusion with Latent Contact for Place Recognition in Challenging Cross-Scenes

ModaLink: Unifying Modalities for Efficient Image-to-PointCloud Place Recognition

(LC)$^2$: LiDAR-Camera Loop Constraints For Cross-Modal Place Recognition

Attention-Enhanced Cross-modal Localization Between Spherical Images and Point Clouds

Global Localization in Large-scale Point Clouds via Roll-pitch-yaw Invariant Place Recognition and Low-overlap Global Registration

MSSPlace: Multi-Sensor Place Recognition with Visual and Text Semantics

A fast LiDAR place recognition and localization method by fusing local and global search

CCL: Continual Contrastive Learning for LiDAR Place Recognition

VXP: Voxel-Cross-Pixel Large-scale Image-LiDAR Place Recognition

CVTNet: A Cross-View Transformer Network for Place Recognition Using LiDAR Data

LiDAR‐based place recognition for mobile robots in ground/water surface multiple scenes