Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning

Rafael Elberg,Denis Parra,Mircea Petrache
2024-05-03
Abstract:Image and multimodal machine learning tasks are very challenging to solve in the case of poorly distributed data. In particular, data availability and privacy restrictions exacerbate these hurdles in the medical domain. The state of the art in image generation quality is held by Latent Diffusion models, making them prime candidates for tackling this problem. However, a few key issues still need to be solved, such as the difficulty in generating data from under-represented classes and a slow inference process. To mitigate these issues, we propose a new method for image augmentation in long-tailed data based on leveraging the rich latent space of pre-trained Stable Diffusion Models. We create a modified separable latent space to mix head and tail class examples. We build this space via Iterated Learning of underlying sparsified embeddings, which we apply to task-specific saliency maps via a K-NN approach. Code is available at
Computer Vision and Pattern Recognition,Artificial Intelligence
What problem does this paper attempt to address?
The paper attempts to address the problem of imbalanced data distribution, especially in medical image analysis and generation tasks, and how to effectively perform data augmentation to improve the performance of minority classes. Current methods such as resampling and traditional data augmentation techniques have drawbacks, such as introducing biases, overfitting, or generating unrealistic samples. The paper proposes a new data augmentation method based on feature space augmentation and iterative learning, particularly targeting the latent space of pre-trained stable diffusion models. Through iterative learning and sparse embedding, the paper constructs a separable latent space to mix examples from the head and tail classes. The bottom-level sparse embedding is applied to the saliency map of a specific task using the K-NN method to select and combine specific features of the data. This approach aims to overcome the interference issues that arise when directly combining features in the latent space and promotes improved generalization abilities by limiting the information bottleneck. Experiments show that although the image generation quality of this method may be lower than other methods in certain settings, by using only a small number of diffusion inference steps, it can produce competitive results for downstream classification tasks while maintaining high image quality. However, due to the way labels are assigned, classification errors can occur, and in some cases, the original data labels may not be preserved, resulting in the performance of the classification task being affected. In conclusion, the paper proposes an innovative data augmentation strategy that leverages deep learning and prior knowledge of medical images to address the data imbalance issue in long-tailed distribution datasets. Future research will explore larger-scale datasets and different label assignment techniques to further improve the method and expand its applicability.