LEST: Large-scale LiDAR Semantic Segmentation with Transformer

Chuanyu Luo,Nuo Cheng,Sikun Ma,Han Li,Xiaohan Li,Shengguang Lei,Pu Li
DOI: https://doi.org/10.48550/arXiv.2307.09367
2023-07-14
Abstract:Large-scale LiDAR-based point cloud semantic segmentation is a critical task in autonomous driving perception. Almost all of the previous state-of-the-art LiDAR semantic segmentation methods are variants of sparse 3D convolution. Although the Transformer architecture is becoming popular in the field of natural language processing and 2D computer vision, its application to large-scale point cloud semantic segmentation is still limited. In this paper, we propose a LiDAR sEmantic Segmentation architecture with pure Transformer, LEST. LEST comprises two novel components: a Space Filling Curve (SFC) Grouping strategy and a Distance-based Cosine Linear Transformer, DISCO. On the public nuScenes semantic segmentation validation set and SemanticKITTI test set, our model outperforms all the other state-of-the-art methods.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
This paper attempts to solve the problem of large - scale LiDAR point cloud semantic segmentation. Specifically, the paper focuses on how to effectively use the Transformer architecture to process large - scale point cloud data in the autonomous driving perception system to achieve high - precision semantic segmentation. ### Background and Problem In the autonomous driving system, LiDAR - based point cloud 3D environmental perception is crucial for safe and reliable driving. Different from image - based 2D perception tasks, large - scale point cloud data is irregular, sparse and unordered, which makes 3D environmental perception tasks more challenging. In particular, the 3D semantic segmentation task usually requires finer - grained and spatial information, and these requirements make the semantic segmentation task more difficult. ### Limitations of Existing Methods 1. **Traditional Methods**: Early methods such as PointNet aggregate the features of local unordered points through max - pooling, but this method is less efficient when dealing with large - scale point clouds. 2. **3D Convolution Methods**: Although sparse 3D convolution performs well in 3D object detection, in large - scale point cloud semantic segmentation tasks, its performance is limited due to the cubic complexity of the convolution kernel and the limited receptive field. 3. **Transformer Application**: Although Transformer has achieved great success in natural language processing (NLP) and 2D computer vision fields, its application in large - scale point cloud semantic segmentation is still limited. The main reason is that the scale of point cloud data is huge, and directly applying Transformer will lead to high computational complexity. ### Main Contributions of the Paper 1. **Proposing the LEST Architecture**: The authors propose a pure Transformer architecture - LEST (Large - scale LiDAR Semantic Segmentation with Transformer) for large - scale LiDAR point cloud semantic segmentation tasks. 2. **SFC Grouping Strategy**: A grouping strategy based on Space Filling Curve (SFC) is introduced to group point cloud data efficiently, and standard Transformer is used within each group to aggregate local features. This strategy ensures that the number of points in each group is almost the same, thereby reducing the computational complexity. 3. **DISCO Module**: A new linear Transformer - Distance - based Cosine Linear Transformer (DISCO) is proposed to construct a global receptive field with linear complexity. The DISCO module overcomes the limitations of traditional dot product and cosine similarity by using the 1 - norm distance between vectors as a similarity measure. ### Experimental Results On the two large - scale LiDAR semantic segmentation datasets, nuScenes and SemanticKITTI, the LEST model outperforms the existing state - of - the - art methods. The experimental results show that LEST not only improves the computational efficiency but also significantly improves the segmentation accuracy in multiple categories. ### Summary By introducing the SFC Grouping strategy and the DISCO module, this paper successfully applies Transformer to large - scale LiDAR point cloud semantic segmentation tasks, and solves the efficiency and performance problems of existing methods in dealing with large - scale point cloud data.