AdaptiveOcc: Adaptive Octree-based Network for Multi-Camera 3D Semantic Occupancy Prediction in Autonomous Driving

Tianyu Yang,Yeqiang Qian,Weihao Yan,Chunxiang Wang,Ming Yang
DOI: https://doi.org/10.1109/tcsvt.2024.3492289
IF: 5.859
2024-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Multi-camera 3D semantic occupancy prediction is a critical task for autonomous driving, playing a vital role in understanding the environment. Current methods mainly rely on uniform voxel representation to encode space, which greatly limits their resolution scalability. It causes most existing methods to struggle with scaling to finer granularities, as the cubic growth nature of uniform voxel leads to a significant increase in the demand for computational and storage resources when scaling. To address this, we propose a multi-level hierarchical model AdaptiveOcc. Using the octree structure, our model can adaptively represent different parts of space with varying voxel granularity. It can selectively extend resolution only for a small subset of voxels, thus mitigating the substantial computational and storage burden brought by scaling. To endow our model with adaptability, we propose a distance-adaptive octree construction rule for generating supervised labels. Considering that the voxel granularity requirements vary for different distance ranges in environmental perception, such a construction rule results in a higher likelihood of coarser granularity for distant regions and finer granularity for nearby regions. This ensures a more efficient and rational allocation of computational resources, further reducing the inference latency. Extensive experiments on nuScenes, SemanticKITTI and Waymo dataset validate that our method can scale to finer granularities with faster speed, and less training memory compared with other state-of-the-art methods. Our code is available at https://github.com/yty-sky/AdaptiveOcc.
What problem does this paper attempt to address?