Abstract:Diffusion models have recently been employed to generate high-quality images, reducing the need for manual data collection and improving model generalization in tasks such as object detection, instance segmentation, and image perception. However, the synthetic framework is usually designed with meticulous human effort for each task due to various requirements on image layout, content, and annotation formats, restricting the application of synthetic data on more general scenarios. In this paper, we propose AnySynth, a unified framework integrating adaptable, comprehensive, and highly controllable components capable of generating an arbitrary type of synthetic data given diverse requirements. Specifically, the Task-Specific Layout Generation Module is first introduced to produce reasonable layouts for different tasks by leveraging the generation ability of large language models and layout priors of real-world images. A Uni-Controlled Image Generation Module is then developed to create high-quality synthetic images that are controllable and based on the generated layouts. In addition, user specific reference images, and style images can be incorporated into the generation to task requirements. Finally, the Task-Oriented Annotation Module offers precise and detailed annotations for the generated images across different tasks. We have validated our framework's performance across various tasks, including Few-shot Object Detection, Cross-domain Object Detection, Zero-shot Composed Image Retrieval, and Multi-modal Image Perception and Grounding. The specific data synthesized by our framework significantly improves model performance in these tasks, demonstrating the generality and effectiveness of our framework.

Is synthetic data from generative models ready for image recognition?

Is Synthetic Image Useful for Transfer Learning? An Investigation into Data Generation, Volume, and Utilization

Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images

Synthetic Data for Face Recognition: Current State and Future Prospects

SynFace: Face Recognition with Synthetic Data

The Unmet Promise of Synthetic Training Images: Using Retrieved Real Images Performs Better

Synthetic Data for Model Selection

AI-Generated Images as Data Source: The Dawn of Synthetic Era

Synthetic images generation for text detection and recognition in the wild

Improving the Effectiveness of Deep Generative Data

A Study on Improving Realism of Synthetic Data for Machine Learning

Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images

Fake it till you make it: Learning transferable representations from synthetic ImageNet clones

Synthetic Data from Diffusion Models Improves ImageNet Classification

AnySynth: Harnessing the Power of Image Synthetic Data Generation for Generalized Vision-Language Tasks

Machine Learning for Synthetic Data Generation: A Review

StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners

Benchmarking and Analyzing Generative Data for Visual Recognition

Scaling Laws of Synthetic Images for Model Training ... for Now