Unsupervised Domain Adaptation Learning for Hierarchical Infant Pose Recognition with Synthetic Data

Cheng-Yen Yang,Zhongyu Jiang,Shih-Yu Gu,Jenq-Neng Hwang,Jang-Hee Yoo
DOI: https://doi.org/10.48550/arXiv.2205.01892
2022-05-04
Abstract:The Alberta Infant Motor Scale (AIMS) is a well-known assessment scheme that evaluates the gross motor development of infants by recording the number of specific poses achieved. With the aid of the image-based pose recognition model, the AIMS evaluation procedure can be shortened and automated, providing early diagnosis or indicator of potential developmental disorder. Due to limited public infant-related datasets, many works use the SMIL-based method to generate synthetic infant images for training. However, this domain mismatch between real and synthetic training samples often leads to performance degradation during inference. In this paper, we present a CNN-based model which takes any infant image as input and predicts the coarse and fine-level pose labels. The model consists of an image branch and a pose branch, which respectively generates the coarse-level logits facilitated by the unsupervised domain adaptation and the 3D keypoints using the HRNet with SMPLify optimization. Then the outputs of these branches will be sent into the hierarchical pose recognition module to estimate the fine-level pose labels. We also collect and label a new AIMS dataset, which contains 750 real and 4000 synthetic infants images with AIMS pose labels. Our experimental results show that the proposed method can significantly align the distribution of synthetic and real-world datasets, thus achieving accurate performance on fine-grained infant pose recognition.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the domain transfer problem when using synthetic data to train models in infant pose recognition due to the lack of real - infant - image datasets. Specifically, the paper aims to reduce the distribution difference between synthetic data and real data through Unsupervised Domain Adaptation (UDA) technology, thereby improving the fine - grained pose recognition performance of the model on real - infant images. ### Main research problems: 1. **Domain transfer problem**: Due to privacy and ethical reasons, it is very difficult to obtain a large amount of labeled real - infant - image data. Therefore, most studies rely on synthetic data for model training. However, this domain transfer (from synthetic data to real data) will lead to a decline in model performance. 2. **Fine - grained infant pose recognition**: Traditional infant - pose - assessment methods (such as AIMS) need to be completed manually by professionals, which are time - consuming and inefficient. The paper proposes an automated method to achieve fine - grained infant - pose recognition through deep - learning technology to assist in early diagnosis or predict potential developmental disorders. ### Solutions: - **Dataset construction**: The paper collected and labeled a new AIMS dataset, which contains 750 real - infant images and 4,000 synthetic - infant images. Each image has coarse - and fine - level pose labels. - **Unsupervised Domain Adaptation (UDA)**: By introducing UDA technology, the distribution difference between synthetic data and real data is reduced, and the generalization ability of the model on real data is improved. - **Multi - branch model**: A CNN model including an image branch and a pose branch is proposed. The image branch is used to generate coarse - level pose labels, and the pose branch uses HRNet and SMPLify to optimize and generate 3D key points. Finally, the fine - grained pose labels are estimated through the Hierarchical Pose Recognition Module (HIPC). ### Experimental results: - **Coarse - level classification**: After using the LMMD loss function, the Top - 1 accuracy of the image branch has increased from 75.5% to 85.3%. - **Fine - grained classification**: By combining the output of the image branch to guide the prediction of the pose branch, the Top - 1 accuracy of fine - grained classification has increased from 68.0% to 76.8%. ### Conclusion: Through the Unsupervised Domain Adaptation technology and the Hierarchical Pose Recognition framework, the paper has successfully solved the domain transfer problem between synthetic data and real data and has achieved a significant performance improvement in the fine - grained infant - pose - recognition task. This method is expected to play an important role in the early diagnosis and assessment of infant motor development.