Abstract:As a novel 3D scene representation, semantic occupancy has gained much attention in autonomous driving. However, existing occupancy prediction methods mainly focus on designing better occupancy representations, such as tri-perspective view or neural radiance fields, while ignoring the advantages of using long-temporal information. In this paper, we propose a radar-camera multi-modal temporal enhanced occupancy prediction network, dubbed TEOcc. Our method is inspired by the success of utilizing temporal information in 3D object detection. Specifically, we introduce a temporal enhancement branch to learn temporal occupancy prediction. In this branch, we randomly discard the t-k input frame of the multi-view camera and predict its 3D occupancy by long-term and short-term temporal decoders separately with the information from other adjacent frames and multi-modal inputs. Besides, to reduce computational costs and incorporate multi-modal inputs, we specially designed 3D convolutional layers for long-term and short-term temporal decoders. Furthermore, since the lightweight occupancy prediction head is a dense classification head, we propose to use a shared occupancy prediction head for the temporal enhancement and main branches. It is worth noting that the temporal enhancement branch is only performed during training and is discarded during inference. Experiment results demonstrate that TEOcc achieves state-of-the-art occupancy prediction on nuScenes benchmarks. In addition, the proposed temporal enhancement branch is a plug-and-play module that can be easily integrated into existing occupancy prediction methods to improve the performance of occupancy prediction. The code and models will be released at <a class="link-external link-https" href="https://github.com/VDIGPKU/TEOcc" rel="external noopener nofollow">this https URL</a>.

AFOcc: Multi-Modal Semantic Occupancy Prediction with Accurate Fusion

OccFusion: Multi-Sensor Fusion Framework for 3D Semantic Occupancy Prediction

OccLoff: Learning Optimized Feature Fusion for 3D Occupancy Prediction

PMAFusion: Projection-Based Multi-Modal Alignment for 3D Semantic Occupancy Prediction

Co-Occ: Coupling Explicit Feature Fusion with Volume Rendering Regularization for Multi-Modal 3D Semantic Occupancy Prediction

DAOcc: 3D Object Detection Assisted Multi-Sensor Fusion for 3D Occupancy Prediction

OccFusion: Depth Estimation Free Multi-sensor Fusion for 3D Occupancy Prediction

AEFF-SSC: An Attention-Enhanced Feature Fusion for 3D Semantic Scene Completion

$α$-OCC: Uncertainty-Aware Camera-based 3D Semantic Occupancy Prediction

ALOcc: Adaptive Lifting-based 3D Semantic Occupancy and Cost Volume-based Flow Prediction

AdaOcc: Adaptive-Resolution Occupancy Prediction

Collaborative Semantic Occupancy Prediction with Hybrid Feature Fusion in Connected Automated Vehicles

Offboard Occupancy Refinement with Hybrid Propagation for Autonomous Driving

Multi-Sem Fusion: Multimodal Semantic Fusion for 3D Object Detection

AFTR: A Robustness Multi-Sensor Fusion Model for 3D Object Detection Based on Adaptive Fusion Transformer

RGB and LiDAR Fusion-based 3D Semantic Segmentation for Autonomous Driving

TEOcc: Radar-camera Multi-modal Occupancy Prediction via Temporal Enhancement

SurroundOcc: Multi-Camera 3D Occupancy Prediction for Autonomous Driving

Occlusion-Guided Multi-Modal Fusion for Vehicle-Infrastructure Cooperative 3D Object Detection

LiCROcc: Teach Radar for Accurate Semantic Occupancy Prediction using LiDAR and Camera