Abstract:The existing 3D deep learning methods adopt either individual point-based features or local-neighboring voxel-based features, and demonstrate great potential for processing 3D data. However, the point based models are inefficient due to the unordered nature of point clouds and the voxel-based models suffer from large information loss. Motivated by the success of recent point-voxel representation, such as PVCNN, we propose a new convolutional neural network, called Multi Point-Voxel Convolution (MPVConv), for deep learning on point clouds. Integrating both the advantages of voxel and point-based methods, MPVConv can effectively increase the neighboring collection between point-based features and also promote independence among voxel-based features. Moreover, most of the existing approaches aim at solving one specific task, and only a few of them can handle a variety of tasks. Simply replacing the corresponding convolution module with MPVConv, we show that MPVConv can fit in different backbones to solve a wide range of 3D tasks. Extensive experiments on benchmark datasets such as ShapeNet Part, S3DIS and KITTI for various tasks show that MPVConv improves the accuracy of the backbone (PointNet) by up to \textbf{36\%}, and achieves higher accuracy than the voxel-based model with up to \textbf{34}$\times$ speedups. In addition, MPVConv outperforms the state-of-the-art point-based models with up to \textbf{8}$\times$ speedups. Notably, our MPVConv achieves better accuracy than the newest point-voxel-based model PVCNN (a model more efficient than PointNet) with lower latency.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the deficiencies of existing 3D deep - learning methods when processing point - cloud data. Specifically: 1. **Efficiency problem of point - based models**: Due to the unordered nature of point - cloud data, point - based models are inefficient when processing high - resolution models and require expensive random memory access and dynamic kernel calculations. 2. **Information loss problem of voxel - based models**: Voxel - based models convert irregular and sparse point clouds into regular 3D grids. Although they can apply widely - studied convolutional neural networks (CNNs), their performance heavily depends on the voxelization resolution. A low resolution will lead to a large amount of information loss, while a high resolution requires a large amount of GPU memory and computing time. 3. **Generality problem of existing methods**: Most existing methods aim to solve specific tasks, and only a few methods can handle multiple tasks. These methods are usually only applicable within specific task frameworks, such as 3D object detection, 3D auto - encoding CAD construction, or 3D semantic segmentation. To solve these problems, the authors propose a new convolutional neural network - Multi Point - Voxel Convolution (MPVConv) for deep learning of point clouds. MPVConv combines the advantages of voxel - and point - based methods and can solve a wide range of 3D tasks in different backbone networks, improving the accuracy and efficiency of the model. ### Main contributions: 1. **Improve neighborhood collection of point - based features**: By applying 3D CNN and MLP simultaneously on points and voxels, MPVConv can increase the neighborhood collection between point - based features. 2. **Enhance independence of voxel - based features**: Also through the above method, MPVConv can promote the independence between voxel - based features. 3. **Wide applicability**: MPVConv can be applied to different backbone networks to solve various 3D tasks, such as 3D part segmentation, indoor scene segmentation, and 3D object detection. 4. **Significant performance improvement**: Experimental results show that MPVConv outperforms existing point - based, voxel - based, and point - voxel - based methods on multiple benchmark datasets (such as ShapeNet Part, S3DIS, and KITTI), and also has a significant speed improvement. In conclusion, by proposing MPVConv, this paper aims to overcome the limitations of existing 3D deep - learning methods when processing point - cloud data and provide a more efficient, accurate, and more general - purpose solution.

Multi Point-Voxel Convolution (MPVConv) for Deep Learning on Point Clouds

Continuous Volumetric Convolution Network with Self-Learning Kernels for Point Clouds

Multi Voxel-Point Neurons Convolution (MVPConv) for Fast and Accurate 3D Deep Learning

MPVNN: Multi-resolution Point-Voxel Non-parametric Network for 3D Point Cloud Processing

Point-Voxel CNN for Efficient 3D Deep Learning

RepPVConv: attentively fusing reparameterized voxel features for efficient 3D point cloud perception

PointConv: Deep Convolutional Networks on 3D Point Clouds

VTPNet for 3D deep learning on point cloud

The Devils in the Point Clouds: Studying the Robustness of Point Cloud Convolutions

PointVGG: Graph Convolutional Network with Progressive Aggregating Features on Point Clouds.

PointVoxelFormer -- Reviving point cloud networks for 3D medical imaging

PointConvFormer: Revenge of the Point-based Convolution

PVConvNet: Pixel-Voxel Sparse Convolution for multimodal 3D object detection

PyraPVConv: Efficient 3D Point Cloud Perception with Pyramid Voxel Convolution and Sharable Attention

PVT: Point-Voxel Transformer for 3D Deep Learning

Multi-View PointNet for 3D Scene Understanding

Pointwise Convolutional Neural Networks

PVT: Point-Voxel Transformer for Point Cloud Learning

Optimized CNNs for Rapid 3D Point Cloud Object Recognition

P2V-RCNN: Point to Voxel Feature Learning for 3D Object Detection From Point Clouds