Abstract:Image and multimodal machine learning tasks are very challenging to solve in the case of poorly distributed data. In particular, data availability and privacy restrictions exacerbate these hurdles in the medical domain. The state of the art in image generation quality is held by Latent Diffusion models, making them prime candidates for tackling this problem. However, a few key issues still need to be solved, such as the difficulty in generating data from under-represented classes and a slow inference process. To mitigate these issues, we propose a new method for image augmentation in long-tailed data based on leveraging the rich latent space of pre-trained Stable Diffusion Models. We create a modified separable latent space to mix head and tail class examples. We build this space via Iterated Learning of underlying sparsified embeddings, which we apply to task-specific saliency maps via a K-NN approach. Code is available at

What problem does this paper attempt to address?

The paper attempts to address the problem of imbalanced data distribution, especially in medical image analysis and generation tasks, and how to effectively perform data augmentation to improve the performance of minority classes. Current methods such as resampling and traditional data augmentation techniques have drawbacks, such as introducing biases, overfitting, or generating unrealistic samples. The paper proposes a new data augmentation method based on feature space augmentation and iterative learning, particularly targeting the latent space of pre-trained stable diffusion models. Through iterative learning and sparse embedding, the paper constructs a separable latent space to mix examples from the head and tail classes. The bottom-level sparse embedding is applied to the saliency map of a specific task using the K-NN method to select and combine specific features of the data. This approach aims to overcome the interference issues that arise when directly combining features in the latent space and promotes improved generalization abilities by limiting the information bottleneck. Experiments show that although the image generation quality of this method may be lower than other methods in certain settings, by using only a small number of diffusion inference steps, it can produce competitive results for downstream classification tasks while maintaining high image quality. However, due to the way labels are assigned, classification errors can occur, and in some cases, the original data labels may not be preserved, resulting in the performance of the classification task being affected. In conclusion, the paper proposes an innovative data augmentation strategy that leverages deep learning and prior knowledge of medical images to address the data imbalance issue in long-tailed distribution datasets. Future research will explore larger-scale datasets and different label assignment techniques to further improve the method and expand its applicability.

Long Tail Image Generation Through Feature Space Augmentation and Iterated Learning

Feature Space Augmentation for Long-Tailed Data

Data-Centric Long-Tailed Image Recognition

Leapfrog Latent Consistency Model (LLCM) for Medical Images Generation

AnomalyDiffusion: Few-Shot Anomaly Image Generation with Diffusion Model

Improved Generation of Synthetic Imaging Data Using Feature-Aligned Diffusion

Instance-Specific Semantic Augmentation for Long-Tailed Image Classification

Image generation via latent space learning using improved combination

Latent-based Diffusion Model for Long-tailed Recognition

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

DDFA: a displacement and diffusion-based feature augmentation method for imbalanced image recognition

DALDA: Data Augmentation Leveraging Diffusion Model and LLM with Adaptive Guidance Scaling

Norm-guided latent space exploration for text-to-image generation

SAU: A Dual-Branch Network to Enhance Long-Tailed Recognition via Generative Models

ProAug: Prototype-Based Augmentation for Long-Tailed Image Classification.

Multiscale Latent Diffusion Model for Enhanced Feature Extraction from Medical Images

Text-Guided Diverse Image Synthesis for Long-Tailed Remote Sensing Object Classification

Augmenting medical image classifiers with synthetic data from latent diffusion models

SAFA: Sample-Adaptive Feature Augmentation for Long-Tailed Image Classification

Diffusion-based Data Augmentation for Skin Disease Classification: Impact Across Original Medical Datasets to Fully Synthetic Images