PanoSSC: Exploring Monocular Panoptic 3D Scene Reconstruction for Autonomous Driving

Yining Shi,Jiusi Li,Kun Jiang,Ke Wang,Yunlong Wang,Mengmeng Yang,Diange Yang
2024-06-11
Abstract:Vision-centric occupancy networks, which represent the surrounding environment with uniform voxels with semantics, have become a new trend for safe driving of camera-only autonomous driving perception systems, as they are able to detect obstacles regardless of their shape and occlusion. Modern occupancy networks mainly focus on reconstructing visible voxels from object surfaces with voxel-wise semantic prediction. Usually, they suffer from inconsistent predictions of one object and mixed predictions for adjacent objects. These confusions may harm the safety of downstream planning modules. To this end, we investigate panoptic segmentation on 3D voxel scenarios and propose an instance-aware occupancy network, PanoSSC. We predict foreground objects and backgrounds separately and merge both in post-processing. For foreground instance grouping, we propose a novel 3D instance mask decoder that can efficiently extract individual objects. we unify geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation into PanoSSC framework and propose new metrics for evaluating panoptic voxels. Extensive experiments show that our method achieves competitive results on SemanticKITTI semantic scene completion benchmark.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of panoptic 3D scene reconstruction based on monocular images, especially in the context of autonomous driving. Specifically, the authors propose a new method, PanoSSC (Panoptic Semantic Scene Completion), for predicting 3D voxel - level occupancy, semantics, and instance IDs from monocular RGB images. The following are the main problems of this research: 1. **Limitations of existing methods**: - Existing semantic occupancy networks mainly focus on reconstructing visible voxels from object surfaces and performing voxel - level semantic prediction. However, these methods have difficulties in dealing with occluded, deformed, or semantically ambiguous obstacles. - Current methods have less exploration in instance extraction, resulting in inconsistent semantic predictions for the same object and confusion between adjacent objects. These problems may affect the safety of downstream planning modules. 2. **Proposed new task**: - The paper proposes a new task - panoptic 3D scene reconstruction, aiming to predict the occupancy state, semantic label, and instance ID of each voxel to provide a more comprehensive environmental representation. 3. **Technical challenges**: - How to efficiently extract 3D information from monocular images and perform semantic and instance segmentation. - How to design a framework that can perform geometric reconstruction, 3D semantic segmentation, and 3D instance segmentation simultaneously and promote each other among different tasks. 4. **Application scenarios**: - This method is especially suitable for autonomous driving systems because it can improve the accuracy of understanding the surrounding environment, thereby ensuring the safety and reliability of driving. By proposing the PanoSSC framework, the authors hope to achieve more detailed and accurate 3D scene understanding in the field of autonomous driving, especially in dealing with complex and diverse outdoor environments.