Panoptic-FlashOcc: An Efficient Baseline to Marry Semantic Occupancy with Panoptic via Instance Center

Zichen Yu,Changyong Shu,Qianpu Sun,Yifan Bian,Xiaobao Wei,Jiangyong Yu,Zongdai Liu,Dawei Yang,Hui Li,Yan Chen
2024-10-24
Abstract:Panoptic occupancy poses a novel challenge by aiming to integrate instance occupancy and semantic occupancy within a unified framework. However, there is still a lack of efficient solutions for panoptic occupancy. In this paper, we propose Panoptic-FlashOcc, a straightforward yet robust 2D feature framework that enables realtime panoptic occupancy. Building upon the lightweight design of FlashOcc, our approach simultaneously learns semantic occupancy and class-aware instance clustering in a single network, these outputs are jointly incorporated through panoptic occupancy procession for panoptic occupancy. This approach effectively addresses the drawbacks of high memory and computation requirements associated with three-dimensional voxel-level representations. With its straightforward and efficient design that facilitates easy deployment, Panoptic-FlashOcc demonstrates remarkable achievements in panoptic occupancy prediction. On the Occ3D-nuScenes benchmark, it achieves exceptional performance, with 38.5 RayIoU and 29.1 mIoU for semantic occupancy, operating at a rapid speed of 43.9 FPS. Furthermore, it attains a notable score of 16.0 RayPQ for panoptic occupancy, accompanied by a fast inference speed of 30.2 FPS. These results surpass the performance of existing methodologies in terms of both speed and accuracy. The source code and trained models can be found at the following github repository: <a class="link-external link-https" href="https://github.com/Yzichen/FlashOCC" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of efficient prediction of multi - view 3D panoptic occupancy. Specifically, the authors propose the **Panoptic - FlashOcc** model to integrate instance occupancy and semantic occupancy within a unified framework, thereby achieving efficient real - time panoptic occupancy prediction. #### Main problems and challenges 1. **Lack of efficient solutions**: - Current 3D panoptic occupancy methods have problems of high memory and computational requirements, especially in 3D voxel - level representations. 2. **Real - time performance and accuracy**: - Application scenarios such as autonomous driving, robot navigation, and environmental mapping require that the model not only has high accuracy but also has fast inference ability and can be deployed on edge devices. 3. **Complexity of network design**: - The 3D panoptic occupancy task needs to handle semantic segmentation and instance discrimination simultaneously, which places higher requirements on network design. #### Solutions To address the above challenges, the authors propose **Panoptic - FlashOcc**, which has the following main features: 1. **Lightweight design**: - Based on the **FlashOcc** architecture, it avoids expensive 3D voxel - level representations by converting flattened bird - eye - view (BEV) features into 3D occupancy predictions. 2. **Instance - centric prediction**: - A lightweight centerness head is introduced to generate class - aware instance centers, thereby enhancing the ability to distinguish instances. 3. **Efficient post - processing module**: - An efficient panoptic occupancy processing module is designed. This module only relies on matrix and logical operations and does not involve any trainable parameters, ensuring the inference speed. 4. **Superior performance**: - Experimental results on the Occ3D - nuScenes dataset show that **Panoptic - FlashOcc** not only achieves excellent performance in metrics such as RayIoU and mIoU, but also achieves an inference speed of up to 43.9 FPS. Through these improvements, **Panoptic - FlashOcc** successfully solves the efficiency and real - time problems in the current 3D panoptic occupancy tasks and provides a more feasible solution for practical applications.