Abstract:3D semantic occupancy prediction, which seeks to provide accurate and comprehensive representations of environment scenes, is important to autonomous driving systems. For autonomous cars equipped with multi-camera and LiDAR, it is critical to aggregate multi-sensor information into a unified 3D space for accurate and robust predictions. Recent methods are mainly built on the 2D-to-3D transformation that relies on sensor calibration to project the 2D image information into the 3D space. These methods, however, suffer from two major limitations: First, they rely on accurate sensor calibration and are sensitive to the calibration noise, which limits their application in real complex environments. Second, the spatial transformation layers are computationally expensive and limit their running on an autonomous vehicle. In this work, we attempt to exploit a Robust and Efficient 3D semantic Occupancy (REO) prediction scheme. To this end, we propose a calibration-free spatial transformation based on vanilla attention to implicitly model the spatial correspondence. In this way, we robustly project the 2D features to a predefined BEV plane without using sensor calibration as input. Then, we introduce 2D and 3D auxiliary training tasks to enhance the discrimination power of 2D backbones on spatial, semantic, and texture features. Last, we propose a query-based prediction scheme to efficiently generate large-scale fine-grained occupancy predictions. By fusing point clouds that provide complementary spatial information, our REO surpasses the existing methods by a large margin on three benchmarks, including OpenOccupancy, Occ3D-nuScenes, and SemanticKITTI Scene Completion. For instance, our REO achieves 19.8× speedup compared to Co-Occ, with 1.1 improvements in geometry IoU on OpenOccupancy. Our code will be available at https://github.com/ICEORY/REO.

3Dopformer: 3D Occupancy Perception from Multi-Camera Images with Directional and Distance Enhancement

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

Learning Occupancy for Monocular 3D Object Detection

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

OCC-VO: Dense Mapping via 3D Occupancy-Based Visual Odometry for Autonomous Driving

Robust 3D Semantic Occupancy Prediction with Calibration-free Spatial Transformation

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

Cam4DOcc: Benchmark for Camera-Only 4D Occupancy Forecasting in Autonomous Driving Applications

Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

Towards Flexible 3D Perception: Object-Centric Occupancy Completion Augments 3D Object Detection

A Simple Framework for 3D Occupancy Estimation in Autonomous Driving

ViPOcc: Leveraging Visual Priors from Vision Foundation Models for Single-View 3D Occupancy Prediction

MR-Occ: Efficient Camera-LiDAR 3D Semantic Occupancy Prediction Using Hierarchical Multi-Resolution Voxel Representation

OccupancyDETR: Using DETR for Mixed Dense-sparse 3D Occupancy Prediction

Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction

AdaptiveOcc: Adaptive Octree-based Network for Multi-Camera 3D Semantic Occupancy Prediction in Autonomous Driving

OccNeRF: Advancing 3D Occupancy Prediction in LiDAR-Free Environments