Diffusion Curriculum: Synthetic-to-Real Generative Curriculum Learning via Image-Guided Diffusion

Yijun Liang,Shweta Bhardwaj,Tianyi Zhou
2024-10-18
Abstract:Low-quality or scarce data has posed significant challenges for training deep neural networks in practice. While classical data augmentation cannot contribute very different new data, diffusion models opens up a new door to build self-evolving AI by generating high-quality and diverse synthetic data through text-guided prompts. However, text-only guidance cannot control synthetic images' proximity to the original images, resulting in out-of-distribution data detrimental to the model performance. To overcome the limitation, we study image guidance to achieve a spectrum of interpolations between synthetic and real images. With stronger image guidance, the generated images are similar to the training data but hard to learn. While with weaker image guidance, the synthetic images will be easier for model but contribute to a larger distribution gap with the original data. The generated full spectrum of data enables us to build a novel "Diffusion Curriculum (DisCL)". DisCL adjusts the image guidance level of image synthesis for each training stage: It identifies and focuses on hard samples for the model and assesses the most effective guidance level of synthetic images to improve hard data learning. We apply DisCL to two challenging tasks: long-tail (LT) classification and learning from low-quality data. It focuses on lower-guidance images of high-quality to learn prototypical features as a warm-up of learning higher-guidance images that might be weak on diversity or quality. Extensive experiments showcase a gain of 2.7% and 2.1% in OOD and ID macro-accuracy when applying DisCL to iWildCam dataset. On ImageNet-LT, DisCL improves the base model's tail-class accuracy from 4.4% to 23.64% and leads to a 4.02% improvement in all-class accuracy.
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the issue of poor training performance of deep neural networks in practical applications due to low data quality or insufficient data quantity. Specifically: 1. **Data Quality Issues**: In many real-world scenarios, data is collected from real environments, so the quality and quantity of data are often not guaranteed. For example, images captured by field cameras, traffic cameras, sports cameras, or robot cameras may be affected by lighting conditions, weather, motion blur, or object positions, all of which can impact data quality. 2. **Data Imbalance Issues**: In the collected data, the number of samples in different categories may be very imbalanced, leading to poor performance of the model on minority categories (i.e., tail categories). 3. **Data Distribution Gap Issues**: Low-quality or insufficient data can increase the distribution gap between training data and test data, thereby affecting the model's generalization performance. To address these issues, the paper proposes a new method called "Diffusion Curriculum Learning" (DisCL). DisCL compensates for the deficiencies of the original data by generating high-quality and diverse synthetic data and gradually narrows the gap between synthetic data and real data by adjusting the generation method of synthetic data. Specifically, DisCL includes two stages: 1. **Synthetic to Real Data Generation**: Using a pre-trained model to identify "hard samples" in the original data and generating a full spectrum of data from fully synthetic to nearly real data by adjusting the image guidance level. 2. **Generative Curriculum Learning**: Selecting appropriate synthetic data for training according to different stages of training. In this way, DisCL can adjust the quality, diversity, and difficulty of the data at different training stages, thereby improving the model's performance in handling difficult data. The paper validates the effectiveness of DisCL on two challenging tasks: long-tail classification and learning from low-quality data. Experimental results show that DisCL significantly improves the model's performance on these tasks, especially in handling tail categories and low-quality data.