Multi Point-Voxel Convolution (MPVConv) for Deep Learning on Point Clouds

Wei Zhou,Xin Cao,Xiaodan Zhang,Xingxing Hao,Dekui Wang,Ying He
DOI: https://doi.org/10.48550/arXiv.2107.13152
2021-07-28
Abstract:The existing 3D deep learning methods adopt either individual point-based features or local-neighboring voxel-based features, and demonstrate great potential for processing 3D data. However, the point based models are inefficient due to the unordered nature of point clouds and the voxel-based models suffer from large information loss. Motivated by the success of recent point-voxel representation, such as PVCNN, we propose a new convolutional neural network, called Multi Point-Voxel Convolution (MPVConv), for deep learning on point clouds. Integrating both the advantages of voxel and point-based methods, MPVConv can effectively increase the neighboring collection between point-based features and also promote independence among voxel-based features. Moreover, most of the existing approaches aim at solving one specific task, and only a few of them can handle a variety of tasks. Simply replacing the corresponding convolution module with MPVConv, we show that MPVConv can fit in different backbones to solve a wide range of 3D tasks. Extensive experiments on benchmark datasets such as ShapeNet Part, S3DIS and KITTI for various tasks show that MPVConv improves the accuracy of the backbone (PointNet) by up to \textbf{36\%}, and achieves higher accuracy than the voxel-based model with up to \textbf{34}$\times$ speedups. In addition, MPVConv outperforms the state-of-the-art point-based models with up to \textbf{8}$\times$ speedups. Notably, our MPVConv achieves better accuracy than the newest point-voxel-based model PVCNN (a model more efficient than PointNet) with lower latency.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the deficiencies of existing 3D deep - learning methods when processing point - cloud data. Specifically: 1. **Efficiency problem of point - based models**: Due to the unordered nature of point - cloud data, point - based models are inefficient when processing high - resolution models and require expensive random memory access and dynamic kernel calculations. 2. **Information loss problem of voxel - based models**: Voxel - based models convert irregular and sparse point clouds into regular 3D grids. Although they can apply widely - studied convolutional neural networks (CNNs), their performance heavily depends on the voxelization resolution. A low resolution will lead to a large amount of information loss, while a high resolution requires a large amount of GPU memory and computing time. 3. **Generality problem of existing methods**: Most existing methods aim to solve specific tasks, and only a few methods can handle multiple tasks. These methods are usually only applicable within specific task frameworks, such as 3D object detection, 3D auto - encoding CAD construction, or 3D semantic segmentation. To solve these problems, the authors propose a new convolutional neural network - Multi Point - Voxel Convolution (MPVConv) for deep learning of point clouds. MPVConv combines the advantages of voxel - and point - based methods and can solve a wide range of 3D tasks in different backbone networks, improving the accuracy and efficiency of the model. ### Main contributions: 1. **Improve neighborhood collection of point - based features**: By applying 3D CNN and MLP simultaneously on points and voxels, MPVConv can increase the neighborhood collection between point - based features. 2. **Enhance independence of voxel - based features**: Also through the above method, MPVConv can promote the independence between voxel - based features. 3. **Wide applicability**: MPVConv can be applied to different backbone networks to solve various 3D tasks, such as 3D part segmentation, indoor scene segmentation, and 3D object detection. 4. **Significant performance improvement**: Experimental results show that MPVConv outperforms existing point - based, voxel - based, and point - voxel - based methods on multiple benchmark datasets (such as ShapeNet Part, S3DIS, and KITTI), and also has a significant speed improvement. In conclusion, by proposing MPVConv, this paper aims to overcome the limitations of existing 3D deep - learning methods when processing point - cloud data and provide a more efficient, accurate, and more general - purpose solution.