SynthDa: Exploiting Existing Real-World Data for Usable and Accessible Synthetic Data Generation

Megani Rajendran,Chek Tien Tan,Indriyati Atmosukarto,Andrew Grant,Aik Beng Ng,Zhihua Zhou,Simon See
DOI: https://doi.org/10.1145/3610543.3626168
2023-01-01
Abstract:Acquiring real-world data for computer vision presents challenges such as data scarcity, high costs, and privacy concerns. We introduce SynthDa, an automated approach for usable synthetic data generation (SDG) that empowers users with varying expertise to create diverse synthetic data from existing real-world datasets. It combines pose estimation, synthetic scene creation, and domain randomization to offer data variants. Ease of SDG through SynthDa enables different permutations and combinations of synthetic data that allow users to explore efficacy of various data configurations in relation to their specific AI tasks. Our experiments across multiple existing datasets and models demonstrate the utility of SynthDa in challenging nuances such as the "more data, the better" paradigm; revealing that excessive synthetic data may degrade performance and vice versa. In a pilot user study with 24 participants, we show the perceived usefulness of SynthDa as a promising SDG tool for overcoming challenges related to real-world data acquisition.
What problem does this paper attempt to address?