Point-Voxel Fusion for 3D Object Detection

Wei Wu,Yisha Liu,Weimin Xue,Yan Zhuang
DOI: https://doi.org/10.23919/ccc58697.2023.10240949
2023-01-01
Abstract:In 3D object detection, network prediction accuracy is greatly affected by point cloud's feature richness. However, the feature richness depends on fine-grained features extracted by the network. Currently some methods use voxel encoding approach continuously down-scaled by 3D convolution to improve the detection efficiency, but lose too many fine-grained features. Some methods directly inputting the original point cloud into the Multi-layer Perceptron (MLP) for feature extraction, which can retain more fine-grained features, but greatly reduce the detection efficiency. This work combines voxel features and point features to obtain a fused 3D map. We use an attention mechanism module that combines semantic features with spatial features to progress the former 3D feature map, which is used to constitute a richer 3D feature structure to reduce the loss of Z-axis features. Since the object geometry structure information is important for the detection task, we design a geometry-oriented auxiliary network that is jointly optimized by supervising two tasks in the training phase to guide the backbone network to understand the target structure features and discard them in the inference phase. The experiments show that our proposed detection method outperforms some previous methods in KITTI 3D/BEV detection.
What problem does this paper attempt to address?