Point-Voxel Fusion for Multimodal 3D Detection

Ke Wang,Zhichuang Zhang
DOI: https://doi.org/10.1109/iv51971.2022.9827226
2022-01-01
Abstract:Many LiDAR-based methods have achieved encouraging results on 3D detection tasks, but detection of small objects such as pedestrians remains challenging. On the contrary, it is easy to detect small dimensional objects from images of cameras. The existing point cloud and image feature fusion methods are dominated by the point cloud, and due to the sparseness of the point cloud, some information of the image is lost. We propose a new fusion method named PVFusion to try to fuse more image features. We first divide each point into a separate perspective voxel and project the voxel onto the image feature maps. Then the semantic feature of the perspective voxel is fused with the geometric feature of the point. A 3D object detection model is designed using PVFusion. During training we employ the ground truth paste (GT-Paste) data augmentation and solve the occlusion problem caused by newly added object.
What problem does this paper attempt to address?