Abstract:As a novel 3D scene representation, semantic occupancy has gained much attention in autonomous driving. However, existing occupancy prediction methods mainly focus on designing better occupancy representations, such as tri-perspective view or neural radiance fields, while ignoring the advantages of using long-temporal information. In this paper, we propose a radar-camera multi-modal temporal enhanced occupancy prediction network, dubbed TEOcc. Our method is inspired by the success of utilizing temporal information in 3D object detection. Specifically, we introduce a temporal enhancement branch to learn temporal occupancy prediction. In this branch, we randomly discard the t-k input frame of the multi-view camera and predict its 3D occupancy by long-term and short-term temporal decoders separately with the information from other adjacent frames and multi-modal inputs. Besides, to reduce computational costs and incorporate multi-modal inputs, we specially designed 3D convolutional layers for long-term and short-term temporal decoders. Furthermore, since the lightweight occupancy prediction head is a dense classification head, we propose to use a shared occupancy prediction head for the temporal enhancement and main branches. It is worth noting that the temporal enhancement branch is only performed during training and is discarded during inference. Experiment results demonstrate that TEOcc achieves state-of-the-art occupancy prediction on nuScenes benchmarks. In addition, the proposed temporal enhancement branch is a plug-and-play module that can be easily integrated into existing occupancy prediction methods to improve the performance of occupancy prediction. The code and models will be released at <a class="link-external link-https" href="https://github.com/VDIGPKU/TEOcc" rel="external noopener nofollow">this https URL</a>.

AdaptiveOcc: Adaptive Octree-based Network for Multi-Camera 3D Semantic Occupancy Prediction in Autonomous Driving

AdaOcc: Adaptive-Resolution Occupancy Prediction

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

TEOcc: Radar-camera Multi-modal Occupancy Prediction via Temporal Enhancement

Occ3D: A Large-Scale 3D Occupancy Prediction Benchmark for Autonomous Driving

Real-Time 3D Occupancy Prediction via Geometric-Semantic Disentanglement

MonoOcc: Digging into Monocular Semantic Occupancy Prediction

HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction

Fully Sparse 3D Occupancy Prediction

OccGen: Generative Multi-modal 3D Occupancy Prediction for Autonomous Driving

OctreeOcc: Efficient and Multi-Granularity Occupancy Prediction Using Octree Queries

$α$-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving

OccFusion: Multi-Sensor Fusion Framework for 3D Semantic Occupancy Prediction

SelfOcc: Self-Supervised Vision-Based 3D Occupancy Prediction

OPUS: Occupancy Prediction Using a Sparse Set

PanoOcc: Unified Occupancy Representation for Camera-based 3D Panoptic Segmentation

OccFormer: Dual-path Transformer for Vision-based 3D Semantic Occupancy Prediction

Scene as Occupancy