Abstract:The need for large amounts of training and validation data is a huge concern in scaling AI algorithms for autonomous driving. Semantic Image Synthesis (SIS), or label-to-image translation, promises to address this issue by translating semantic layouts to images, providing a controllable generation of photorealistic data. However, they require a large amount of paired data, incurring extra costs. In this work, we present a new task: given a dataset with synthetic images and labels and a dataset with unlabeled real images, our goal is to learn a model that can generate images with the content of the input mask and the appearance of real images. This new task reframes the well-known unsupervised SIS task in a more practical setting, where we leverage cheaply available synthetic data from a driving simulator to learn how to generate photorealistic images of urban scenes. This stands in contrast to previous works, which assume that labels and images come from the same domain but are unpaired during training. We find that previous unsupervised works underperform on this task, as they do not handle distribution shifts between two different domains. To bypass these problems, we propose a novel framework with two main contributions. First, we leverage the synthetic image as a guide to the content of the generated image by penalizing the difference between their high-level features on a patch level. Second, in contrast to previous works which employ one discriminator that overfits the target domain semantic distribution, we employ a discriminator for the whole image and multiscale discriminators on the image patches. Extensive comparisons on the benchmarks GTA-V $\rightarrow$ Cityscapes and GTA-V $\rightarrow$ Mapillary show the superior performance of the proposed model against state-of-the-art on this task.

Image‐level Dataset Synthesis with an End‐to‐end Trainable Framework

Learning to Simulate Complex Scenes for Street Scene Segmentation

Learning to Simulate Labelled Datasets with an Image-Level Content Consistent Graph Constraint

Fine-grained Semantic Constraint in Image Synthesis

DCL: Differential Contrastive Learning for Geometry-Aware Depth Synthesis

Attribute Descent: Simulating Object-Centric Datasets on the Content Level and Beyond

Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization

Beyond Photo Realism for Domain Adaptation from Synthetic Data

Development of a Virtual Environment for Rapid Generation of Synthetic Training Images for Artificial Intelligence Object Recognition

Towards Pragmatic Semantic Image Synthesis for Urban Scenes

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-pixel Ground Truth Using Stochastic Grammars

AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks

One-Shot Real-to-Sim via End-to-End Differentiable Simulation and Rendering

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

Exploring Generative AI for Sim2Real in Driving Data Synthesis

DatasetDM: Synthesizing Data with Perception Annotations Using Diffusion Models

Visual Car Brand Classification by Implementing a Synthetic Image Dataset Creation Pipeline

Real-Fake: Effective Training Data Synthesis Through Distribution Matching

MINERVAS: Massive INterior EnviRonments VirtuAl Synthesis

PUG: Photorealistic and Semantically Controllable Synthetic Data for Representation Learning

A Shared Representation for Photorealistic Driving Simulators