Abstract:Low-quality or scarce data has posed significant challenges for training deep neural networks in practice. While classical data augmentation cannot contribute very different new data, diffusion models opens up a new door to build self-evolving AI by generating high-quality and diverse synthetic data through text-guided prompts. However, text-only guidance cannot control synthetic images' proximity to the original images, resulting in out-of-distribution data detrimental to the model performance. To overcome the limitation, we study image guidance to achieve a spectrum of interpolations between synthetic and real images. With stronger image guidance, the generated images are similar to the training data but hard to learn. While with weaker image guidance, the synthetic images will be easier for model but contribute to a larger distribution gap with the original data. The generated full spectrum of data enables us to build a novel "Diffusion Curriculum (DisCL)". DisCL adjusts the image guidance level of image synthesis for each training stage: It identifies and focuses on hard samples for the model and assesses the most effective guidance level of synthetic images to improve hard data learning. We apply DisCL to two challenging tasks: long-tail (LT) classification and learning from low-quality data. It focuses on lower-guidance images of high-quality to learn prototypical features as a warm-up of learning higher-guidance images that might be weak on diversity or quality. Extensive experiments showcase a gain of 2.7% and 2.1% in OOD and ID macro-accuracy when applying DisCL to iWildCam dataset. On ImageNet-LT, DisCL improves the base model's tail-class accuracy from 4.4% to 23.64% and leads to a 4.02% improvement in all-class accuracy.

What problem does this paper attempt to address?

The paper attempts to address the issue of poor training performance of deep neural networks in practical applications due to low data quality or insufficient data quantity. Specifically: 1. **Data Quality Issues**: In many real-world scenarios, data is collected from real environments, so the quality and quantity of data are often not guaranteed. For example, images captured by field cameras, traffic cameras, sports cameras, or robot cameras may be affected by lighting conditions, weather, motion blur, or object positions, all of which can impact data quality. 2. **Data Imbalance Issues**: In the collected data, the number of samples in different categories may be very imbalanced, leading to poor performance of the model on minority categories (i.e., tail categories). 3. **Data Distribution Gap Issues**: Low-quality or insufficient data can increase the distribution gap between training data and test data, thereby affecting the model's generalization performance. To address these issues, the paper proposes a new method called "Diffusion Curriculum Learning" (DisCL). DisCL compensates for the deficiencies of the original data by generating high-quality and diverse synthetic data and gradually narrows the gap between synthetic data and real data by adjusting the generation method of synthetic data. Specifically, DisCL includes two stages: 1. **Synthetic to Real Data Generation**: Using a pre-trained model to identify "hard samples" in the original data and generating a full spectrum of data from fully synthetic to nearly real data by adjusting the image guidance level. 2. **Generative Curriculum Learning**: Selecting appropriate synthetic data for training according to different stages of training. In this way, DisCL can adjust the quality, diversity, and difficulty of the data at different training stages, thereby improving the model's performance in handling difficult data. The paper validates the effectiveness of DisCL on two challenging tasks: long-tail classification and learning from low-quality data. Experimental results show that DisCL significantly improves the model's performance on these tasks, especially in handling tail categories and low-quality data.

Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

Plug-and-Play Diffusion Distillation

Towards Practical Plug-and-Play Diffusion Models

Elucidating The Design Space of Classifier-Guided Diffusion Generation

FineDiffusion: Scaling up Diffusion Models for Fine-grained Image Generation with 10,000 Classes

End-to-End Diffusion Latent Optimization Improves Classifier Guidance

Readout Guidance: Learning Control from Diffusion Features

Self-Guided Diffusion Models

Learning on Less: Constraining Pre-trained Model Learning for Generalizable Diffusion-Generated Image Detection

InvDiff: Invariant Guidance for Bias Mitigation in Diffusion Models

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

Transfer Learning for Diffusion Models

DetDiffusion: Synergizing Generative and Perceptive Models for Enhanced Data Generation and Perception

Guided Diffusion from Self-Supervised Diffusion Features

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

Enhancing Diffusion-Based Image Synthesis with Robust Classifier Guidance

Steered Diffusion: A Generalized Framework for Plug-and-Play Conditional Image Synthesis

Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models

Your Diffusion Model is Secretly a Zero-Shot Classifier

Diffusion Cocktail: Mixing Domain-Specific Diffusion Models for Diversified Image Generations