Abstract:3D pose transfer that aims to transfer the desired pose to a target mesh is one of the most challenging 3D generation tasks. Previous attempts rely on well-defined parametric human models or skeletal joints as driving pose sources. However, to obtain those clean pose sources, cumbersome but necessary pre-processing pipelines are inevitable, hindering implementations of the real-time applications. This work is driven by the intuition that the robustness of the model can be enhanced by introducing adversarial samples into the training, leading to a more invulnerable model to the noisy inputs, which even can be further extended to directly handling the real-world data like raw point clouds/scans without intermediate processing. Furthermore, we propose a novel 3D pose Masked Autoencoder (3D-PoseMAE), a customized MAE that effectively learns 3D extrinsic presentations (i.e., pose). 3D-PoseMAE facilitates learning from the aspect of extrinsic attributes by simultaneously generating adversarial samples that perturb the model and learning the arbitrary raw noisy poses via a multi-scale masking strategy. Both qualitative and quantitative studies show that the transferred meshes given by our network result in much better quality. Besides, we demonstrate the strong generalizability of our method on various poses, different domains, and even raw scans. Experimental results also show meaningful insights that the intermediate adversarial samples generated in the training can successfully attack the existing pose transfer models.
What problem does this paper attempt to address?
### Problems Addressed by the Paper
This paper aims to address the robustness issue in 3D pose transfer tasks. Specifically, existing 3D pose transfer methods rely on clean and well-defined parameterized human models or skeletal joints as the driving pose source, which requires cumbersome but necessary preprocessing steps, limiting the realization of real-time applications. This paper enhances the model's robustness and generalization ability by introducing adversarial learning, enabling the model to directly handle noisy input data and even raw point cloud or scan data without intermediate processing steps.
### Main Contributions
1. **Solution to Robustness Issue**:
- This paper is the first to attempt to solve the robustness issue in 3D pose transfer from the perspective of adversarial learning.
- A new research method is proposed, which simulates noisy input and even raw scan data through generating adversarial samples, achieving end-to-end pose transfer on raw point clouds and scan data.
2. **Adversarial Learning Framework**:
- An adversarial learning framework specifically tailored for 3D pose transfer tasks is introduced, including a new Pose Transfer (PT) adversarial function and a method for real-time calculation of adversarial samples during backpropagation.
- This is the first time that real-time calculation of adversarial samples has been achieved in a 3D generative deep learning pipeline.
3. **3D-PoseMAE Architecture**:
- A novel architecture based on Masked Autoencoder (MAE) called 3D-PoseMAE is proposed, which captures extrinsic attributes (i.e., poses) through multi-scale masking strategies and progressive channel attention operations.
- 3D-PoseMAE demonstrates excellent computational efficiency and generative capability.
4. **Experimental Results**:
- Extensive experiments were conducted on multiple datasets and data sources, showing that the proposed method has significant robustness and generalization ability when handling noisy input and real-world noisy raw scan data.
- The code will be publicly released.
### Method Overview
1. **Adversarial Sample Generation**:
- The 3D-PoseMAE model is used in evaluation mode to obtain the gradient of the data, generating perturbations to the mesh through the output gradient, thereby creating adversarial samples.
- Adversarial samples are used for adversarial training to enhance the model's robustness.
2. **3D-PoseMAE Network Architecture**:
- **Multi-Scale Masked 3D Encoder**: The input point cloud is divided into multiple scales, and through random downsampling and masking operations, the model is driven to learn the same extrinsic attributes in different scale representations.
- **3D-PoseMAE Decoder**: Adopts a channel attention mechanism, integrating pose and target information more compactly through progressive channel attention operations, avoiding redundant spatial information.
3. **Optimization Objective**:
- A complete optimization objective function is defined, including reconstruction loss and edge loss.
4. **Adversarial Training**:
- A new pose transfer adversarial function is constructed by minimizing the distance between the generated result and the real mesh.
- Adversarial samples are generated using methods such as PGD and used for adversarial training to enhance the model's robustness.
### Experimental Validation
- **Quantitative Evaluation**: Quantitative evaluations were conducted on the SMPL-NPT and FAUST datasets, demonstrating the model's performance in handling clean samples and adversarial samples.
- **Qualitative Visualization**: Showcased the model's strong generalization ability in handling new poses and new identities, as well as the generated adversarial samples.
- **Ablation Study**: Evaluated the effectiveness of each component, verifying the method's effectiveness.
In summary, this paper significantly improves the robustness and generalization ability of 3D pose transfer tasks by introducing adversarial learning and the 3D-PoseMAE architecture.