AFOcc: Multi-Modal Semantic Occupancy Prediction with Accurate Fusion

Wenbo Chu,Keyong Wang,Guofa Li,Bing Lu,Xiangyun Ren,Ling Zheng,Xiaolin Tang,Keqiang Li
DOI: https://doi.org/10.1109/jsen.2024.3449349
IF: 4.3
2024-01-01
IEEE Sensors Journal
Abstract:3D scene perception technology enhances the safety and decision-making ability of autonomous vehicles by accurately acquiring and analyzing stereo information from the environment. Compared to traditional object detection methods, semantic occupancy perception offers greater flexibility in describing 3D scenes with arbitrary shapes and various categories. However, existing methods for semantic occupancy perception face challenges such as poor generalization of depth estimation and inaccurate alignment and fusion of multimodal features. In this paper, a novel multimodal semantic occupancy prediction method, AFOcc, is proposed. AFOcc addresses these challenges by adopting a fusion technique based on feature alignment and an attention mechanism. The method extracts multi-scale features from image and LiDAR data, encodes LiDAR voxel features using sparse convolution, and projects them onto 2D image features for precise alignment. The projection of point clouds onto images is achieved through a feature alignment module. Finally, a learnable fusion module adaptively adjusts the weights of different modal features to enhance the fusion effect. Extensive experiments on the nuScenes-Occupancy dataset demonstrate that AFOcc significantly outperforms state-of-the-art methods in terms of the mIoU metric. Notably, in the bicycle and motorcycle categories, an IoU improvement of more than 40% is achieved. These results illustrate the superior perception and robustness capabilities of AFOcc in complex scenes.
What problem does this paper attempt to address?