Abstract:Large-scale LiDAR mappings and localization leverage place recognition techniques to mitigate odometry drifts, ensuring accurate mapping. These techniques utilize scene representations from LiDAR point clouds to identify previously visited sites within a database. Local descriptors, assigned to each point within a point cloud, are aggregated to form a scene representation for the point cloud. These descriptors are also used to re-rank the retrieved point clouds based on geometric fitness scores. We propose SALSA, a novel, lightweight, and efficient framework for LiDAR place recognition. It consists of a Sphereformer backbone that uses radial window attention to enable information aggregation for sparse distant points, an adaptive self-attention layer to pool local descriptors into tokens, and a multi-layer-perceptron Mixer layer for aggregating the tokens to generate a scene descriptor. The proposed framework outperforms existing methods on various LiDAR place recognition datasets in terms of both retrieval and metric localization while operating in real-time.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the challenges in large - scale LiDAR (Light Detection and Ranging) place recognition, especially improving the accuracy of localization and retrieval while ensuring real - time performance and computational efficiency. Specifically, traditional LiDAR place recognition systems perform poorly when dealing with large and complex scenes, and although existing deep - learning methods have made some improvements, it is difficult to strike a balance between accuracy and computational efficiency. ### Main problems of the paper 1. **Limitations of traditional methods**: - Traditional methods based on manual feature statistics and histograms cannot effectively describe large and complex scenes. - Although existing deep - learning methods have improved performance, they still have deficiencies in aggregating information of sparse and long - distance points. 2. **Real - time performance and computational efficiency**: - Existing methods often require a large amount of computational resources while achieving high - precision localization, and it is difficult to meet the requirements of real - time applications. 3. **Data association and geometric verification**: - In SLAM (Simultaneous Localization and Mapping), place recognition is crucial for data association and geometric verification, but the performance of existing methods in this regard still needs to be improved. ### Proposed solutions To solve the above problems, the paper proposes SALSA (Swift Adaptive Lightweight Self - Attention for Enhanced LiDAR Place Recognition), a lightweight and efficient LiDAR place recognition framework. The main innovations of SALSA include: 1. **Sphereformer backbone network**: - Use radial window attention to enhance the information aggregation ability of sparse and long - distance points. - Combine with traditional cubic window attention to improve the robustness of local feature description. 2. **Adaptive attention pooling layer**: - Aggregate different numbers of local features into a fixed number of tokens through the self - attention mechanism, thereby improving computational efficiency and retaining information. 3. **MLP Mixer aggregator**: - Use a multi - layer perceptron (MLP) Mixer to fuse tokens, generate global scene descriptors, and reduce dimensions and decorrelate through PCA whitening. 4. **Re - ranking mechanism**: - Use the spectral matching method based on the compatibility graph to re - rank the retrieved point clouds, further improving the retrieval performance. ### Summary By introducing techniques such as radial window attention, adaptive attention pooling, and MLP Mixer, SALSA significantly improves the accuracy and computational efficiency of LiDAR place recognition, and can maintain high performance in real - time applications. This enables SALSA to perform excellently on multiple large - scale LiDAR place recognition benchmark datasets and outperform existing methods.

SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place Recognition

SemSCo: Semantic Frequency Domain Scan Context for LiDAR-Based Place Recognition.

Context for LiDAR-based Place Recognition

SSC: Semantic Scan Context for Large-Scale Place Recognition

3D LiDAR-Based Global Localization Using Siamese Neural Network

FreSCo: Frequency-Domain Scan Context for LiDAR-based Place Recognition with Translation and Rotation Invariance

Efficient LiDAR Odometry for Autonomous Driving.

Spherical Transformer for LiDAR-based 3D Recognition

Locus: LiDAR-based Place Recognition using Spatiotemporal Higher-Order Pooling

SC-LPR: Spatiotemporal Context Based LiDAR Place Recognition

SphereVLAD++: Attention-Based and Signal-Enhanced Viewpoint Invariant Descriptor

Semantics-enhanced discriminative descriptor learning for LiDAR-based place recognition

SOLVR: Submap Oriented LiDAR-Visual Re-Localisation

CASSPR: Cross Attention Single Scan Place Recognition

SelFLoc: Selective Feature Fusion for Large-scale Point Cloud-based Place Recognition

Attention-Guided Lidar Segmentation and Odometry Using Image-to-Point Cloud Saliency Transfer

Local Descriptor for Robust Place Recognition using LiDAR Intensity

A fast LiDAR place recognition and localization method by fusing local and global search

ReFeree: Radar-Based Lightweight and Robust Localization using Feature and Free space

An Efficient LiDAR SLAM With Angle-Based Feature Extraction and Voxel-Based Fixed-Lag Smoothing