DiffuseMix: Label-Preserving Data Augmentation with Diffusion Models

Khawar Islam,Muhammad Zaigham Zaheer,Arif Mahmood,Karthik Nandakumar
2024-04-05
Abstract:Recently, a number of image-mixing-based augmentation techniques have been introduced to improve the generalization of deep neural networks. In these techniques, two or more randomly selected natural images are mixed together to generate an augmented image. Such methods may not only omit important portions of the input images but also introduce label ambiguities by mixing images across labels resulting in misleading supervisory signals. To address these limitations, we propose DiffuseMix, a novel data augmentation technique that leverages a diffusion model to reshape training images, supervised by our bespoke conditional prompts. First, concatenation of a partial natural image and its generated counterpart is obtained which helps in avoiding the generation of unrealistic images or label ambiguities. Then, to enhance resilience against adversarial attacks and improves safety measures, a randomly selected structural pattern from a set of fractal images is blended into the concatenated image to form the final augmented image for training. Our empirical results on seven different datasets reveal that DiffuseMix achieves superior performance compared to existing state-of the-art methods on tasks including general classification,fine-grained classification, fine-tuning, data scarcity, and adversarial robustness. Augmented datasets and codes are available here:
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address several key issues present in existing image mixing data augmentation methods in deep learning model training. Specifically: 1. **Label Ambiguity**: Existing data augmentation techniques generate new augmented images by mixing images of different categories, which may lead to unclear labels and thus misleading supervision signals. 2. **Important Region Omission**: Existing techniques may miss important parts of the input images when mixing images. 3. **Cost and Limitations of Saliency Detection Dependence**: Some studies attempt to introduce saliency detection-based methods to alleviate the above issues, but these methods are not only costly but also limited in effectiveness. To address these issues, the paper proposes the DIFFUSE MIX method, a new data augmentation technique that generates images using a diffusion model. DIFFUSE MIX is implemented through the following steps: 1. **Conditional Prompt Generation**: Generate images from the diffusion model using conditional prompts (e.g., "autumn scenery," "snow scene"). 2. **Splicing Original and Generated Images**: Splice parts of the original image with parts of the generated image to form a mixed image, preserving key semantic information. 3. **Fractal Image Fusion**: Fuse randomly selected fractal images with the mixed image to obtain the final augmented image for training, enhancing structural diversity and avoiding overfitting to the generated content. Experimental results show that DIFFUSE MIX outperforms existing state-of-the-art data augmentation methods on multiple benchmark datasets, with significant improvements in general classification, fine-grained classification, adversarial robustness, transfer learning, and data scarcity tasks.