3Dopformer: 3D Occupancy Perception from Multi-Camera Images with Directional and Distance Enhancement

Chuandong Lyu,Shengbang Guo,Bin Zhou,Hailiang Xiong,Hongchao Zhou
DOI: https://doi.org/10.1109/tiv.2023.3343749
IF: 8.2
2023-01-01
IEEE Transactions on Intelligent Vehicles
Abstract:Vision-based 3D scene perception and understanding are crucial for autonomous driving, robot navigation, and obstacle avoidance. However, describing objects with arbitrary shapes presents challenges in traditional 3D object detection tasks. In order to better perceive indoor and outdoor scenes, we propose a new method called 3DOPFormer in this paper, which can achieve 3D occupancy perception for autonomous driving through pure vision. Specifically, it first extracts multi-camera image features and uses spatial occupancy cross-attention implicit learning to lift 2D image features to 3D volume features. Then, it uses 3D deconvolution to upsample volume features. For the first time, we integrate the inherent directional and distance properties of LiDAR rays into the 3D occupancy prediction task, leading to outstanding occupancy prediction performance. And we gradually learn about dense visual occupancy perception using sparse point cloud supervision and neural rendering methods, without using expensive dense occupancy annotation supervision for training. Additionally, a cost-effective data collection scheme is introduced, and based on which, we create a dataset named ML3DOP that encompasses both indoor and outdoor scenes using 4-eye cameras and 16-beam LiDAR for 3D occupancy perception. The experimental results demonstrate that our approach can achieve accurate 3D occupancy prediction by only using multiple cameras. Dataset: https://github.com/lvchuandong/ML3DOP .
What problem does this paper attempt to address?