FastOcc: Accelerating 3D Occupancy Prediction by Fusing the 2D Bird's-Eye View and Perspective View

Jiawei Hou,Xiaoyan Li,Wenhao Guan,Gang Zhang,Di Feng,Yuheng Du,Xiangyang Xue,Jian Pu
2024-03-05
Abstract:In autonomous driving, 3D occupancy prediction outputs voxel-wise status and semantic labels for more comprehensive understandings of 3D scenes compared with traditional perception tasks, such as 3D object detection and bird's-eye view (BEV) semantic segmentation. Recent researchers have extensively explored various aspects of this task, including view transformation techniques, ground-truth label generation, and elaborate network design, aiming to achieve superior performance. However, the inference speed, crucial for running on an autonomous vehicle, is neglected. To this end, a new method, dubbed FastOcc, is proposed. By carefully analyzing the network effect and latency from four parts, including the input image resolution, image backbone, view transformation, and occupancy prediction head, it is found that the occupancy prediction head holds considerable potential for accelerating the model while keeping its accuracy. Targeted at improving this component, the time-consuming 3D convolution network is replaced with a novel residual-like architecture, where features are mainly digested by a lightweight 2D BEV convolution network and compensated by integrating the 3D voxel features interpolated from the original image features. Experiments on the Occ3D-nuScenes benchmark demonstrate that our FastOcc achieves state-of-the-art results with a fast inference speed.
Computer Vision and Pattern Recognition,Robotics
What problem does this paper attempt to address?
This paper mainly discusses the problem of 3D occupancy prediction in autonomous driving, which is a key task as it provides a more comprehensive 3D scene understanding than traditional perception tasks such as 3D object detection and bird's-eye-view semantic segmentation. Despite existing research exploring various aspects including viewpoint transformation techniques, label generation, and network design, the inference speed is often overlooked, which is crucial in autonomous driving. The paper proposes a new method called FastOcc aimed at accelerating 3D occupancy prediction while maintaining high accuracy. By analyzing the effects and latency of four parts of the network: input image resolution, image backbone network, viewpoint transformation, and occupancy prediction head, it is found that the occupancy prediction head has great potential in optimizing speed and accuracy balance. FastOcc replaces the time-consuming 3D convolutional network by using a lightweight 2D bird's-eye-view convolutional network to digest features and compensating with 3D voxel features interpolated from the original image features. Experimental results show that FastOcc achieves state-of-the-art results on the Occ3D-nuScenes benchmark test while having faster inference speed. The latency of a single inference is reduced to 63 milliseconds, further reduced to 32 milliseconds with the acceleration of the TensorRT SDK. The paper also compares the performance and runtime of different methods, as well as traditional visual perception methods such as 3D object detection and 3D occupancy prediction. The advantage of FastOcc lies in its simplification of 3D perception tasks by compressing features into a bird's-eye-view representation and decoding in 2D form, followed by refining and enhancing 2D features with interpolated 3D features. Additionally, the paper introduces the training loss function. In summary, FastOcc is a real-time and efficient method for 3D occupancy prediction that improves the ability of autonomous driving scene understanding and real-time perception.