Abstract:Human Pose Estimation (HPE) is widely used in various fields, including motion analysis, healthcare, and virtual reality. However, the great expenses of labeled real-world datasets present a significant challenge for HPE. To overcome this, one approach is to train HPE models on synthetic datasets and then perform domain adaptation (DA) on real-world data. Unfortunately, existing DA methods for HPE neglect data privacy and security by using both source and target data in the adaptation process. To this end, we propose a new task, named source-free domain adaptive HPE, which aims to address the challenges of cross-domain learning of HPE without access to source data during the adaptation process. We further propose a novel framework that consists of three models: source model, intermediate model, and target model, which explores the task from both source-protect and target-relevant perspectives. The source-protect module preserves source information more effectively while resisting noise, and the target-relevant module reduces the sparsity of spatial representations by building a novel spatial probability space, and pose-specific contrastive learning and information maximization are proposed on the basis of this space. Comprehensive experiments on several domain adaptive HPE benchmarks show that the proposed method outperforms existing approaches by a considerable margin. The codes are available at <a class="link-external link-https" href="https://github.com/davidpengucf/SFDAHPE" rel="external noopener nofollow">this https URL</a>.
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to achieve cross - domain human pose estimation (HPE) without accessing the source - domain data. Specifically, existing domain adaptation (DA) methods usually need to use the data of both the source domain and the target domain when performing human pose estimation, which not only increases the risks of data privacy and security but also limits the practicality of the model. To overcome these challenges, this paper proposes a new task - source - free domain - adaptive human pose estimation, aiming to reduce the differences between domains by only using the pre - trained source model and the target - domain data, thereby improving the performance of the model on the target domain.
### Main Contributions
1. **Proposing a New Task**: A new task framework, namely source - free domain - adaptive human pose estimation, is constructed, focusing on achieving cross - domain learning without accessing the source - domain data.
2. **Innovative Framework**: A new framework consisting of three models is proposed: the source model, the intermediate model, and the target model. Through the collaborative work of these three models, the task is explored from two perspectives: source protection and target relevance.
3. **Experimental Verification**: Comprehensive experiments are carried out on multiple domain - adapted human pose estimation benchmarks, and the results show that the proposed method is significantly superior to existing methods.
### Method Overview
- **Source Model Pretraining**: First, the source model is pretrained using the labeled source - domain data to retain the knowledge of the source domain.
- **Adaptation Framework**: The adaptation process is divided into two steps:
- **Source - Protection Adaptation (Step A)**: Catastrophic forgetting is prevented by fixing the feature extractor of the source model and updating its regressor, and at the same time, the noise caused by domain shift is suppressed by the residual loss.
- **Target - Relevance Adaptation (Step B)**: The differences between domains are minimized through the interaction between the intermediate model and the target model. In this process, contrastive learning and information maximization techniques are introduced, as well as projection vectors to reduce the sparsity of the heat map, thereby improving the performance of the model.
### Technical Details
- **Residual Loss**: By removing the pixel with the highest confidence in the heat map and using the KL divergence to ensure that the heat map of the intermediate model is close to that of the source model, the source information is retained.
- **Contrastive Learning**: By defining positive and negative sample pairs, the differences in the heat maps at the same position are minimized and the differences in the heat maps at different positions are maximized, thereby improving the generalization ability of the model.
- **Information Maximization**: By maximizing the diversity of the output of the target model, the gap between domains is reduced.
- **Consistency Loss**: By minimizing the differences between the outputs of the intermediate model and the target model, the consistency of the model is ensured.
### Experimental Results
- **Data Sets**: Three human pose data sets (SURREAL, Human3.6M, Leeds Sports Pose) and three hand pose data sets (Rendered Hand Pose Dataset, Hand - 3D - Studio, FreiHand) are used for verification.
- **Main Results**: On multiple domain - adaptation tasks, the proposed method is significantly superior to existing methods, especially in human pose tasks from SURREAL to Human3.6M and SURREAL to LSP, and in hand pose tasks from RHD to H3D and RHD to FreiHand.
In conclusion, through proposing a new task framework and innovative technical methods, this paper successfully solves the key problems in source - free domain - adaptive human pose estimation and provides a new direction for research in this field.