Enhance Image Classification via Inter-Class Image Mixup with Diffusion Model

Zhicai Wang,Longhui Wei,Tan Wang,Heyu Chen,Yanbin Hao,Xiang Wang,Xiangnan He,Qi Tian
2024-03-29
Abstract:Text-to-image (T2I) generative models have recently emerged as a powerful tool, enabling the creation of photo-realistic images and giving rise to a multitude of applications. However, the effective integration of T2I models into fundamental image classification tasks remains an open question. A prevalent strategy to bolster image classification performance is through augmenting the training set with synthetic images generated by T2I models. In this study, we scrutinize the shortcomings of both current generative and conventional data augmentation techniques. Our analysis reveals that these methods struggle to produce images that are both faithful (in terms of foreground objects) and diverse (in terms of background contexts) for domain-specific concepts. To tackle this challenge, we introduce an innovative inter-class data augmentation method known as Diff-Mix (
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The paper attempts to address the problem of how to improve classification performance in image classification tasks within specific domains through data augmentation methods using generative models. Specifically, existing generative models and traditional data augmentation techniques often struggle to simultaneously maintain image fidelity (the realism of the foreground object) and diversity (variations in the background environment) when generating images for specific domain concepts. The paper points out that these methods find it difficult to achieve both high fidelity and high diversity when generating samples for specific domain datasets, which limits their effectiveness in image classification tasks. To solve this problem, the paper proposes a new cross-category data augmentation method called Diff-Mix. This method enriches the dataset by performing image translation between different categories, thereby increasing background diversity while maintaining image fidelity. Experiments demonstrate that Diff-Mix significantly improves performance in various image classification scenarios, including few-shot, regular, and long-tail classification.