Probabilistic Semantic Mapping for Urban Autonomous Driving Applications

David Paz,Hengyuan Zhang,Qinru Li,Hao Xiang,Henrik Christensen
DOI: https://doi.org/10.48550/arXiv.2006.04894
2020-09-12
Abstract:Recent advancements in statistical learning and computational abilities have enabled autonomous vehicle technology to develop at a much faster rate. While many of the architectures previously introduced are capable of operating under highly dynamic environments, many of these are constrained to smaller-scale deployments, require constant maintenance due to the associated scalability cost with high-definition (HD) maps, and involve tedious manual labeling. As an attempt to tackle this problem, we propose to fuse image and pre-built point cloud map information to perform automatic and accurate labeling of static landmarks such as roads, sidewalks, crosswalks, and lanes. The method performs semantic segmentation on 2D images, associates the semantic labels with point cloud maps to accurately localize them in the world, and leverages the confusion matrix formulation to construct a probabilistic semantic map in bird's eye view from semantic point clouds. Experiments from data collected in an urban environment show that this model is able to predict most road features and can be extended for automatically incorporating road features into HD maps with potential future work directions.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is how to automatically generate high - definition semantic maps (HD maps) in order to reduce the cost of manual annotation and maintenance requirements, and improve the safety and efficiency of autonomous vehicles driving in urban environments. Specifically, the article proposes improvement plans for the following challenges: 1. **Timeliness of high - definition maps**: Traditionally, manually - annotated high - definition maps are prone to becoming obsolete, especially in cases where the road network changes frequently. This may lead to inaccurate reference path tracking for autonomous vehicles, which in turn may cause safety hazards. 2. **Scalability of large - scale deployment**: Many existing architectures can operate in highly dynamic environments, but are limited to small - scale deployments. They require continuous maintenance, and the costs associated with high - definition maps are high, making it difficult to promote on a large scale. 3. **Tedious manual annotation process**: In the process of constructing high - definition maps, the workload of extracting semantic information and attributes from data is huge, time - consuming, and costly. To solve these problems, the paper proposes a method that fuses image and pre - constructed point - cloud map information to achieve automatic and accurate annotation of static landmarks (such as roads, sidewalks, zebra crossings, and lanes). Specific methods include: - **Semantic segmentation**: Use a deep - learning model to perform semantic segmentation on 2D images and predict the semantic label of each pixel. - **Semantic association**: Associate semantic labels with points in the point - cloud map to accurately locate the positions of these semantic labels in the world coordinate system. - **Probabilistic semantic mapping**: Use the confusion matrix formula to construct a probabilistic semantic map in the bird's - eye view, thereby capturing the uncertainty of the semantic point cloud. The experimental results show that this model can accurately predict most road features in urban environments and has potential application prospects. It can automatically incorporate road features into high - definition maps. In addition, by combining LiDAR intensity information, the segmentation accuracy of lane markings is further improved. ### Formula Representation The probability update rule mentioned in the paper can be represented as: \[ P(S_t | z_{1:t}, I_{1:t})=\frac{1}{Z} P(z_t | S_t) P(I_t | S_t) P(S_{t - 1} | z_{1:t - 1}, I_{1:t - 1}) \] where: - \( S_t \) represents the semantic label distribution at time \( t \); - \( z_t \) represents the semantic label observed at time \( t \); - \( I_t \) represents the LiDAR intensity observed at time \( t \); - \( Z \) is a normalization factor to ensure that the sum of probabilities is 1. The element \( M_{ij} \) of the confusion matrix \( M \) represents the likelihood that label \( i \) is predicted as label \( j \), which is used to model prediction errors. The conditional probability \( P(I_t | S_t) \) of LiDAR intensity \( I_t \) is modeled based on the reflectivity of each category in the scene.