SALSA: Swift Adaptive Lightweight Self-Attention for Enhanced LiDAR Place Recognition

Raktim Gautam Goswami,Naman Patel,Prashanth Krishnamurthy,Farshad Khorrami
2024-07-30
Abstract:Large-scale LiDAR mappings and localization leverage place recognition techniques to mitigate odometry drifts, ensuring accurate mapping. These techniques utilize scene representations from LiDAR point clouds to identify previously visited sites within a database. Local descriptors, assigned to each point within a point cloud, are aggregated to form a scene representation for the point cloud. These descriptors are also used to re-rank the retrieved point clouds based on geometric fitness scores. We propose SALSA, a novel, lightweight, and efficient framework for LiDAR place recognition. It consists of a Sphereformer backbone that uses radial window attention to enable information aggregation for sparse distant points, an adaptive self-attention layer to pool local descriptors into tokens, and a multi-layer-perceptron Mixer layer for aggregating the tokens to generate a scene descriptor. The proposed framework outperforms existing methods on various LiDAR place recognition datasets in terms of both retrieval and metric localization while operating in real-time.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in large - scale LiDAR (Light Detection and Ranging) place recognition, especially improving the accuracy of localization and retrieval while ensuring real - time performance and computational efficiency. Specifically, traditional LiDAR place recognition systems perform poorly when dealing with large and complex scenes, and although existing deep - learning methods have made some improvements, it is difficult to strike a balance between accuracy and computational efficiency. ### Main problems of the paper 1. **Limitations of traditional methods**: - Traditional methods based on manual feature statistics and histograms cannot effectively describe large and complex scenes. - Although existing deep - learning methods have improved performance, they still have deficiencies in aggregating information of sparse and long - distance points. 2. **Real - time performance and computational efficiency**: - Existing methods often require a large amount of computational resources while achieving high - precision localization, and it is difficult to meet the requirements of real - time applications. 3. **Data association and geometric verification**: - In SLAM (Simultaneous Localization and Mapping), place recognition is crucial for data association and geometric verification, but the performance of existing methods in this regard still needs to be improved. ### Proposed solutions To solve the above problems, the paper proposes SALSA (Swift Adaptive Lightweight Self - Attention for Enhanced LiDAR Place Recognition), a lightweight and efficient LiDAR place recognition framework. The main innovations of SALSA include: 1. **Sphereformer backbone network**: - Use radial window attention to enhance the information aggregation ability of sparse and long - distance points. - Combine with traditional cubic window attention to improve the robustness of local feature description. 2. **Adaptive attention pooling layer**: - Aggregate different numbers of local features into a fixed number of tokens through the self - attention mechanism, thereby improving computational efficiency and retaining information. 3. **MLP Mixer aggregator**: - Use a multi - layer perceptron (MLP) Mixer to fuse tokens, generate global scene descriptors, and reduce dimensions and decorrelate through PCA whitening. 4. **Re - ranking mechanism**: - Use the spectral matching method based on the compatibility graph to re - rank the retrieved point clouds, further improving the retrieval performance. ### Summary By introducing techniques such as radial window attention, adaptive attention pooling, and MLP Mixer, SALSA significantly improves the accuracy and computational efficiency of LiDAR place recognition, and can maintain high performance in real - time applications. This enables SALSA to perform excellently on multiple large - scale LiDAR place recognition benchmark datasets and outperform existing methods.