Building and optimization of 3D semantic map based on Lidar and camera fusion

Jing Li,Xin Zhang,Jiehao Li,Yanyu Liu,Junzheng Wang
DOI: https://doi.org/10.1016/j.neucom.2020.06.004
IF: 6
2020-10-01
Neurocomputing
Abstract:<p>When considering the robot application of the complex scenarios, the traditional geometric maps are insufficient because of the lack of interactions with the environment. In this paper, a three-dimensional (3D) semantic map with large-scale and accurate integrating Lidar and camera information is presented to achieve real-time road scenes. Firstly, simultaneous localization and mapping (SLAM) is performed to locate the robot position with the multi-sensor fusion of the Lidar and inertial measurement unit (IMU), and the map of the surrounding scenes is constructed while the robot is moving. Moreover, a convolutional neural networks (CNNs)-based semantic segmentation of images is employed to develop the semantic map of the environment. Following the synchronization of the time and space, the sensor fusion of Lidar and camera are used to generate the semantic labeled frame of point clouds and then create a semantic map in term of the posture. Besides, improving the capacity of classification, a higher-order 3D full connection conditional random fields (CRFs) method is utilized to optimize the semantic map. Finally, extensive experiment results evaluated on the KITTI dataset have illustrated the effectiveness of the proposed method.</p>
computer science, artificial intelligence
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of the insufficiency of traditional geometric maps due to the lack of interaction with the environment when robots are applied in complex scenarios. Specifically, the paper proposes a method for constructing and optimizing large - scale, high - precision 3D semantic maps based on the fusion of Lidar and cameras to achieve the perception of real - time road scenes. #### Problem background 1. **Limitations of traditional geometric maps**: - Traditional geometric maps only provide geometric information of the environment and cannot meet the robot's need for understanding the environment. - The 3D environmental information constructed by a single sensor (such as an RGB - D camera or a binocular camera) has problems such as high computational complexity, poor real - time performance, and being affected by illumination and texture, and it is difficult to achieve satisfactory performance in large - scale and complex outdoor environments. 2. **Advantages of multi - sensor fusion**: - Lidar can accurately obtain 3D data of objects at a long distance, and has high stability and flexibility, which is suitable for real - time positioning and map construction. - Cameras can directly obtain semantic information through deep learning (such as convolutional neural networks (CNNs)) for image semantic segmentation, avoiding the complexity of stereo - matching calculations. #### Solutions The method proposed in the paper combines the multi - sensor fusion technology of Lidar and cameras, solves the above - mentioned problems, and ensures the real - time and accuracy requirements in the map - building process. The main contributions include: 1. **Multi - sensor fusion for constructing real - time 3D semantic maps**: - By combining the information of Lidar and cameras, the problems of small application range and high computational complexity of traditional RGB - D and binocular vision sensors are solved. 2. **Fast semantic segmentation architecture based on optimized PSPNet - 50**: - A new network structure based on the simplified PSPNet - 50 is proposed, which makes a trade - off between speed and accuracy to meet the needs of semantic map construction. 3. **Optimization of high - order 3D fully - connected conditional random fields (CRFs) model**: - A high - order 3D CRFs model is designed to optimize the initial semantic map, further improving the accuracy of the 3D semantic map results. Through these methods, the paper realizes the efficient and accurate construction of 3D semantic maps and provides solutions for advanced scene - interaction problems (such as target crawling and object searching), thereby improving the efficiency of navigation, positioning, and autonomous driving.