Abstract:Deep neural networks have achieved promising performance in supervised point cloud applications, but manual annotation is extremely expensive and time-consuming in supervised learning schemes. Unsupervised domain adaptation (UDA) addresses this problem by training a model with only labeled data in the source domain but making the model generalize well in the target domain. Existing studies show that self-supervised learning using both source and target domain data can help improve the adaptability of trained models, but they all rely on hand-crafted designs of the self-supervised tasks. In this paper, we propose a learnable self-supervised task and integrate it into a self-supervision-based point cloud UDA architecture. Specifically, we propose a learnable nonlinear transformation that transforms a part of a point cloud to generate abundant and complicated point clouds while retaining the original semantic information, and the proposed self-supervised task is to reconstruct the original point cloud from the transformed ones. In the UDA architecture, an encoder is shared between the networks for the self-supervised task and the main task of point cloud classification or segmentation, so that the encoder can be trained to extract features suitable for both the source and the target domain data. Experiments on PointDA-10 and PointSegDA datasets show that the proposed method achieves new state-of-the-art performance on both classification and segmentation tasks of point cloud UDA. Code will be made publicly available.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve unsupervised domain adaptation (UDA) on point cloud data. Specifically, the authors focus on how to train a model using the labeled data in the source domain without the labels of the target domain, so that the model can effectively classify or segment the data in the target domain. Since the distribution of point cloud data varies greatly between different domains, the performance of a model directly trained on the source domain on the target domain is usually poor. Therefore, this paper proposes a new self - supervised task. By learning the non - linear transformation of point clouds to generate complex point clouds while retaining the original semantic information, and training the model by reconstructing these transformed point clouds, the generalization ability of the model on the target domain is improved.
### Main contributions of the paper
1. **Proposed a new learnable point cloud destruction - reconstruction self - supervised task**: Compared with the existing hand - designed self - supervised tasks, this method can learn more cross - domain transferable features and reduce the distribution differences between different domains. To the best of the authors' knowledge, this is the first learnable self - supervised task applied to point cloud processing and the broader field of computer vision.
2. **Applied the proposed self - supervised task to point cloud UDA and developed a multi - region destruction strategy**: By destroying point clouds in different regions and reconstructing the original point clouds, the encoder is encouraged to focus on local features in the UDA architecture, which is beneficial for domain adaptation.
3. **Evaluated on the PointDA - 10 and PointSegDA datasets**: The experimental results show that this method has reached a new state - of - the - art level in both point cloud classification and segmentation UDA tasks.
### Method overview
- **Problem definition**: In the UDA of point cloud classification, given a source domain \(S=\{(x_{i}^{s},y_{i}^{s})\}_{i = 1}^{n_{s}}\) containing \(n_{s}\) labeled point clouds, and a target domain \(T=\{x_{j}^{t}\}_{j = 1}^{n_{t}}\) containing \(n_{t}\) unlabeled point clouds. The goal is to train a classification network so that it can generalize well on the target domain.
- **UDA framework**: The UDA framework based on self - supervised tasks contains two networks: the main task network \(f_{\text{main}}\) and the auxiliary task network \(f_{\text{aux}}\). These two networks share the same encoder \(f_{\text{enc}}\), but have different heads \(f_{h_{\text{main}}}\) and \(f_{h_{\text{aux}}}\). The main task network is trained with the source domain data, while the auxiliary task network is trained with the source domain and target domain data.
- **Self - supervised task**: Transform the point cloud \(x\) into \(x'\) through a learnable non - linear transformation network \(\phi_{\omega}\), and then reconstruct the original point cloud \(\hat{x}\) through the auxiliary network \(f_{\text{aux}}\). The optimization objective is to maximize the Chamfer distance between the transformed point cloud and the original point cloud, while minimizing the Chamfer distance between the reconstructed point cloud and the original point cloud:
\[
\min\lambda_{1}\text{CD}(\hat{x},x)-\lambda_{2}\text{CD}(x',x)
\]
- **Multi - region transformation**: In order to avoid the problem that the overall transformation makes it difficult to reconstruct, a multi - region local point cloud transformation strategy is introduced. By transforming different parts of the point cloud at different random positions, the encoder can extract features from different parts, so as to better complete the reconstruction task.
### Experimental results
- **Point cloud classification UDA**: On the PointDA - 10 dataset, using PointNet and DGCNN as encoders, the experimental results show that this method has achieved the highest classification accuracy in multiple UDA scenarios.
- **Point cloud segmentation UDA**: On the PointSegDA dataset, this method also performs well, further verifying its effectiveness in different tasks.
In conclusion, this paper significantly improves by proposing a new learnable self - supervised task.