BEVoxSeg: BEV-Voxel Representation for Fast and Accurate Camera-Based 3D Segmentation

Haiyi Liu,Beibei Wang,Lu Zhang,Jianmin Ji,Yanyong Zhang
DOI: https://doi.org/10.1109/icassp48485.2024.10447498
2024-01-01
Abstract:Recent research has demonstrated the advantages of Bird’s-eye-view (BEV) representation in the field of 3D perception. However, due to the lack of height information, BEV representation alone is insufficient to accurately reconstruct the complete surrounding 3D scene. On the other hand, voxel representation excels in describing 3D structures, but their memory and computational cost pose challenges for fast inference. To tackle these limitations, we propose an innovative method dubbed BEVoxSeg, which leverages the computational efficiency of BEV methods while incorporating essential geometric information from voxel features. By combining the advantages from both representations, our approach achieved state-of-the-art results for LiDAR semantic segmentation on nuScenes and demonstrated a superior performance in the occupancy prediction tasks on Occ3D-nuScenes dataset.
What problem does this paper attempt to address?