Abstract:The semantic segmentation of point clouds is an important part of the environment perception for robots. However, it is difficult to directly adopt the traditional 3D convolution kernel to extract features from raw 3D point clouds because of the unstructured property of point clouds. In this paper, a spherical interpolated convolution operator is proposed to replace the traditional grid-shaped 3D convolution operator. This newly proposed feature extraction operator improves the accuracy of the network and reduces the parameters of the network. In addition, this paper analyzes the defect of point cloud interpolation methods based on the distance as the interpolation weight and proposes the self-learned distance-feature density by combining the distance and the feature correlation. The proposed method makes the feature extraction of spherical interpolated convolution network more rational and effective. The effectiveness of the proposed network is demonstrated on the 3D semantic segmentation task of point clouds. Experiments show that the proposed method achieves good performance on the ScanNet dataset and Paris-Lille-3D dataset.
What problem does this paper attempt to address?
This paper attempts to solve several key problems in point cloud semantic segmentation:
1. **Unstructured nature of point clouds**: Traditional 3D convolution kernels are difficult to be directly applied to the original 3D point clouds because point cloud data is unstructured. The paper proposes a spherical interpolated convolution operator (Spherical Interpolated Convolution Operator) to replace the traditional grid - like 3D convolution operator, so as to better extract point cloud features.
2. **Reducing network parameters**: By designing a new spherical interpolated convolution operator, the paper reduces the number of network parameters, thereby reducing memory usage and at the same time improving network accuracy.
3. **Defects in point cloud interpolation methods**: Distance - based point cloud interpolation methods have defects, for example, the interpolation weights are inconsistent in different density regions. The paper proposes self - learning distance - feature density (Distance - Feature Density), which combines distance and feature correlation, making feature extraction more reasonable and effective.
4. **Improving semantic segmentation performance**: The method proposed in the paper has achieved good performance in 3D semantic segmentation tasks. In particular, the experimental results on the ScanNet and Paris - Lille - 3D datasets show that this method is competitive.
### Main contributions
1. **Spherical interpolated convolution operator**: Aiming at unstructured point clouds in 3D space, a dense spherical interpolated convolution operator is proposed. Under the same network structure, this operator uses fewer learning parameters than the 3D grid - like convolution operator, thus reducing memory usage.
2. **Distance - feature density**: According to the characteristics of spatial feature calculation, the concept of distance - feature density is proposed. By effectively learning and combining distance - feature density for spatial feature calculation, the calculation of spatial feature is made more reasonable and effective.
3. **Semantic segmentation network design**: Based on the proposed spherical interpolated convolution operator and distance - feature density, a semantic segmentation network is designed. The experimental results show that this network performs excellently in 3D semantic segmentation tasks, especially on the ScanNet and Paris - Lille - 3D datasets.
### Method overview
1. **Spherical interpolated convolution operator**:
- Use the farthest point sampling (Farthest Point Sampling, FPS) technique to sample output points from input points.
- The spherical unit corresponding to each convolution kernel center obtains the unit features by interpolating the features of surrounding points.
- Use 3D convolution to further process the interpolated features.
2. **Distance - feature density**:
- Collect distance information and feature information within a small neighborhood of each point through ball query.
- Use 1×1 convolution and ReLU activation function to discover the internal relationship between distance information and feature information.
- Extract the aggregated density feature through max pooling.
- Send the density feature to a multi - layer perceptron (MLP) and obtain the distance - feature density through the Sigmoid activation function.
- During the interpolation process, multiply the interpolation weight by the reciprocal of the distance - feature density to adjust the contribution degree of the feature.
3. **Network structure**:
- The encoding part contains 5 layers, and each layer contains two spherical interpolated convolution operators, one of which is used for down - sampling and the other for feature extraction.
- The decoding part also contains 5 layers, and each layer uses a spherical interpolated convolution operator for feature extraction and transmits features through skip connection.
- Finally, predict the final result through a fully connected layer (Fully Connected Layer) and Dropout.
### Experimental results
The paper conducted experiments on the ScanNet and Paris - Lille - 3D datasets. The results show that the proposed spherical interpolated convolution network has achieved good performance in 3D semantic segmentation tasks, especially while reducing network parameters, it has maintained high accuracy.