Unleashing the Potential of Synthetic Images: A Study on Histopathology Image Classification

Leire Benito-Del-Valle,Aitor Alvarez-Gila,Itziar Eguskiza,Cristina L. Saratxaga
2024-09-24
Abstract:Histopathology image classification is crucial for the accurate identification and diagnosis of various diseases but requires large and diverse datasets. Obtaining such datasets, however, is often costly and time-consuming due to the need for expert annotations and ethical constraints. To address this, we examine the suitability of different generative models and image selection approaches to create realistic synthetic histopathology image patches conditioned on class labels. Our findings highlight the importance of selecting an appropriate generative model type and architecture to enhance performance. Our experiments over the PCam dataset show that diffusion models are effective for transfer learning, while GAN-generated samples are better suited for augmentation. Additionally, transformer-based generative models do not require image filtering, in contrast to those derived from Convolutional Neural Networks (CNNs), which benefit from realism score-based selection. Therefore, we show that synthetic images can effectively augment existing datasets, ultimately improving the performance of the downstream histopathology image classification task.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: in the histopathology image classification task, obtaining a large and diverse expert - annotated dataset is both expensive and time - consuming, which restricts model training. To address this issue, researchers have explored different generative models and image selection methods to create realistic synthetic histopathology image patches based on class labels. Specifically, the main objective of this paper is to evaluate the effectiveness of Diffusion Probabilistic Models in generating high - quality synthetic samples to expand the training set and improve the prediction performance of downstream histopathology image classification tasks. ### Core Problems of the Paper 1. **Difficulty in Obtaining Datasets**: Obtaining a large and diverse histopathology image dataset is both expensive and time - consuming and requires expert annotation, which limits the training of deep - learning models. 2. **Quality and Diversity of Synthetic Data**: Ensure that the generated synthetic images are of high quality and diverse, and avoid low - quality or irrelevant synthetic images from affecting model performance. 3. **Application Strategy of Synthetic Data**: Research how to effectively integrate synthetic data into the training process of deep - learning models to balance the ratio of synthetic data and real data, thereby maximizing the performance improvement of classification tasks. ### Research Methods - **Selection of Generative Models**: Different generative models were studied, including Diffusion Models, Generative Adversarial Networks (GANs), and Transformer - based generative models. - **Image Selection Method**: An image selection method based on the Realism Score and the Class - based Realism Score was proposed to filter out low - quality synthetic images. - **Experimental Verification**: Experiments were carried out using the PatchCamelyon (PCam) dataset to evaluate the impact of different generative models and image selection methods on classification tasks. ### Main Contributions 1. **Application of Diffusion Models**: The use of Diffusion Probabilistic Models to generate synthetic histopathology images was proposed, and the effects of different backbone networks were evaluated. 2. **Image Selection Method**: Two post - processing methods based on the Realism Score were introduced to automatically discard low - quality or under - representative samples. 3. **Application Strategy of Synthetic Data**: The most appropriate method for using synthetic data was evaluated, and the best way to use real and synthetic data together or separately for training classification models was explored. 4. **Experimental Results**: The effectiveness of the proposed methods was verified through a series of experiments, especially showing excellent performance in reducing Mode Collapse. Through these studies, the paper demonstrates that synthetic images can effectively enhance existing datasets and ultimately improve the performance of histopathology image classification tasks.