Scan-based Semantic Segmentation of LiDAR Point Clouds: An Experimental Study

Larissa T. Triess,David Peter,Christoph B. Rist,J. Marius Zöllner
DOI: https://doi.org/10.1109/IV47402.2020.9304631
2021-09-24
Abstract:Autonomous vehicles need to have a semantic understanding of the three-dimensional world around them in order to reason about their environment. State of the art methods use deep neural networks to predict semantic classes for each point in a LiDAR scan. A powerful and efficient way to process LiDAR measurements is to use two-dimensional, image-like projections. In this work, we perform a comprehensive experimental study of image-based semantic segmentation architectures for LiDAR point clouds. We demonstrate various techniques to boost the performance and to improve runtime as well as memory constraints. First, we examine the effect of network size and suggest that much faster inference times can be achieved at a very low cost to accuracy. Next, we introduce an improved point cloud projection technique that does not suffer from systematic occlusions. We use a cyclic padding mechanism that provides context at the horizontal field-of-view boundaries. In a third part, we perform experiments with a soft Dice loss function that directly optimizes for the intersection-over-union metric. Finally, we propose a new kind of convolution layer with a reduced amount of weight-sharing along one of the two spatial dimensions, addressing the large difference in appearance along the vertical axis of a LiDAR scan. We propose a final set of the above methods with which the model achieves an increase of 3.2% in mIoU segmentation performance over the baseline while requiring only 42% of the original inference time.
Computer Vision and Pattern Recognition,Machine Learning
What problem does this paper attempt to address?
The problem that this paper attempts to solve is how to improve the performance of semantic segmentation based on LiDAR point clouds while reducing the model's demand for computing resources. Specifically, the author explores this problem through the following four aspects: 1. **Impact of network scale**: The influence of network size on accuracy and running time was studied, and it was proposed that the inference speed can be significantly accelerated by reducing the number of network parameters, with little impact on accuracy. 2. **Improved point - cloud projection technique**: A new point - cloud projection method - scan unfolding - was introduced. This method can reduce systematic occlusion and provide context information, especially when using a cyclic padding mechanism at the horizontal field - of - view boundaries. 3. **Selection of loss function**: Experiments were carried out to compare the effects of the cross - entropy loss function and the soft Dice loss function. It was found that the latter can directly optimize the Intersection - over - Union (IoU) metric, thus achieving better performance on some classes. 4. **Design of semi - local convolution layer**: The Semi - Local Convolutions (SLC) layer was proposed. This convolution layer reduces weight sharing in one spatial dimension and aims to solve the problem of large appearance differences in the vertical direction of LiDAR scans. Through the comprehensive application of these methods, the model proposed by the author has improved the mIoU (mean Intersection - over - Union) performance by 3.2%, while the inference time is only 42% of that of the original model. These improvements are of great significance for application scenarios such as autonomous vehicles that need to process a large amount of data in real - time.