SynPlay: Importing Real-world Diversity for a Synthetic Human Dataset

Jinsub Yim,Hyungtae Lee,Sungmin Eum,Yi-Ting Shen,Yan Zhang,Heesung Kwon,Shuvra S. Bhattacharyya
2024-08-22
Abstract:We introduce Synthetic Playground (SynPlay), a new synthetic human dataset that aims to bring out the diversity of human appearance in the real world. We focus on two factors to achieve a level of diversity that has not yet been seen in previous works: i) realistic human motions and poses and ii) multiple camera viewpoints towards human instances. We first use a game engine and its library-provided elementary motions to create games where virtual players can take less-constrained and natural movements while following the game rules (i.e., rule-guided motion design as opposed to detail-guided design). We then augment the elementary motions with real human motions captured with a motion capture device. To render various human appearances in the games from multiple viewpoints, we use seven virtual cameras encompassing the ground and aerial views, capturing abundant aerial-vs-ground and dynamic-vs-static attributes of the scene. Through extensive and carefully-designed experiments, we show that using SynPlay in model training leads to enhanced accuracy over existing synthetic datasets for human detection and segmentation. The benefit of SynPlay becomes even greater for tasks in the data-scarce regime, such as few-shot and cross-domain learning tasks. These results clearly demonstrate that SynPlay can be used as an essential dataset with rich attributes of complex human appearances and poses suitable for model pretraining. SynPlay dataset comprising over 73k images and 6.5M human instances, is available for download at <a class="link-external link-https" href="https://synplaydataset.github.io/" rel="external noopener nofollow">this https URL</a>.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of insufficient human appearance diversity in existing synthetic datasets. Specifically, the author points out that although existing synthetic datasets have improved in terms of scale and fidelity, they still have limitations in diversity and authenticity, especially when dealing with the task of recognizing overall human appearance from a distance. To overcome these problems, the author proposes a new synthetic human dataset - SynPlay. #### Main problems and solutions 1. **Lack of realistic movement and posture**: - **Problem**: Although existing synthetic datasets can generate diverse human models, their movements and postures are not realistic enough, resulting in limited effectiveness in practical applications. - **Solution**: SynPlay generates natural movements and postures by introducing real - world human movements (using motion - capture devices) and combining them with the game rules in the virtual environment. This method not only increases the diversity of movements but also improves the sense of realism. 2. **Single perspective**: - **Problem**: Most synthetic datasets capture images from only one or a few fixed perspectives, which limits the application range of the dataset, especially in multi - perspective tasks. - **Solution**: SynPlay uses multiple perspectives for image capture, including unmanned aerial vehicles (UAVs), closed - circuit television (CCTV), and unmanned ground vehicles (UGVs), thus providing rich perspective changes and enhancing the diversity and applicability of the dataset. 3. **Poor performance in data - scarce tasks**: - **Problem**: In data - scarce tasks (such as few - shot learning and cross - domain learning), the effectiveness of existing synthetic datasets is not satisfactory. - **Solution**: Through experimental verification, SynPlay shows significant advantages in these tasks. Especially in the case of data scarcity, it can significantly improve the performance of the model. ### Summary This paper solves the deficiencies of existing synthetic datasets in terms of diversity, realism, and perspective diversity by constructing the SynPlay dataset. By introducing real - world human movements and multi - perspective capture, SynPlay not only improves the quality of the dataset but also shows better performance in multiple computer vision tasks, especially in data - scarce tasks.