Abstract:Vision-based object detectors are a crucial basis for robotics applications as they provide valuable information about object localisation in the environment. These need to ensure high reliability in different lighting conditions, occlusions, and visual artifacts, all while running in real-time. Collecting and annotating real-world data for these networks is prohibitively time consuming and costly, especially for custom assets, such as industrial objects, making it untenable for generalization to in-the-wild scenarios. To this end, we present Synthetica, a method for large-scale synthetic data generation for training robust state estimators. This paper focuses on the task of object detection, an important problem which can serve as the front-end for most state estimation problems, such as pose estimation. Leveraging data from a photorealistic ray-tracing renderer, we scale up data generation, generating 2.7 million images, to train highly accurate real-time detection transformers. We present a collection of rendering randomization and training-time data augmentation techniques conducive to robust sim-to-real performance for vision tasks. We demonstrate state-of-the-art performance on the task of object detection while having detectors that run at 50-100Hz which is 9 times faster than the prior SOTA. We further demonstrate the usefulness of our training methodology for robotics applications by showcasing a pipeline for use in the real world with custom objects for which there do not exist prior datasets. Our work highlights the importance of scaling synthetic data generation for robust sim-to-real transfer while achieving the fastest real-time inference speeds. Videos and supplementary information can be found at this URL: <a class="link-external link-https" href="https://sites.google.com/view/synthetica-vision" rel="external noopener nofollow">this https URL</a>.

Synthetic images generation for text detection and recognition in the wild

Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes

Is synthetic data from generative models ready for image recognition?

Synthetica: Large Scale Synthetic Data for Robot Perception

Chinese Text Detection Using Deep Learning Model And Synthetic Data

Scene Text Synthesis for Efficient and Effective Deep Network Training

Efficient Realistic Data Generation Framework leveraging Deep Learning-based Human Digitization

Perceptual Pyramid Adversarial Networks for Text-to-Image Synthesis.

SynthTIGER: Synthetic Text Image GEneratoR Towards Better Text Recognition Models

Generalizable Synthetic Image Detection via Language-guided Contrastive Learning

Generating Synthetic Satellite Imagery With Deep-Learning Text-to-Image Models -- Technical Challenges and Implications for Monitoring and Verification

Enhancing Scene Text Detectors with Realistic Text Image Synthesis Using Diffusion Models

Deep Image Fingerprint: Towards Low Budget Synthetic Image Detection and Model Lineage Analysis

Improving Text Generation on Images with Synthetic Captions

AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks

Learning to Generate Synthetic Data via Compositing

Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization

Detection of AI-Generated Synthetic Images with a Lightweight CNN

Generating Synthetic Handwritten Historical Documents With OCR Constrained GANs

Text to Image Synthesis Using Two-Stage Generation and Two-Stage Discrimination.