Abstract:The availability of training data is one of the main limitations in deep learning applications for medical imaging. Data augmentation is a popular approach to overcome this problem. A new approach is a Machine Learning based augmentation, in particular usage of Generative Adversarial Networks (GAN). In this case, GANs generate images similar to the original dataset so that the overall training data amount is bigger, which leads to better performance of trained networks. A GAN model consists of two networks, a generator and a discriminator interconnected in a feedback loop which creates a competitive environment. This work is a continuation of the previous research where we trained StyleGAN2-ADA by Nvidia on the limited COVID-19 chest X-ray image dataset. In this paper, we study the dependence of the GAN-based augmentation performance on dataset size with a focus on small samples. Two datasets are considered, one with 1000 images per class (4000 images in total) and the second with 500 images per class (2000 images in total). We train StyleGAN2-ADA with both sets and then, after validating the quality of generated images, we use trained GANs as one of the augmentations approaches in multi-class classification problems. We compare the quality of the GAN-based augmentation approach to two different approaches (classical augmentation and no augmentation at all) by employing transfer learning-based classification of COVID-19 chest X-ray images. The results are quantified using different classification quality metrics and compared to the results from the literature. The GAN-based augmentation approach is found to be comparable with classical augmentation in the case of medium and large datasets but underperforms in the case of smaller datasets. The correlation between the size of the original dataset and the quality of classification is visible independently from the augmentation approach.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is how data augmentation techniques perform for datasets of different sizes in medical image classification, especially in COVID - 19 chest X - ray image classification. Specifically, the authors are concerned with whether the data augmentation method based on Generative Adversarial Networks (GAN) can be comparable to traditional data augmentation methods in terms of performance on small - sample datasets, and whether this augmentation method can effectively improve the performance of deep - learning models. ### Background of the Paper - **Data Scarcity Problem**: It is very difficult to obtain medical image data, mainly due to high costs, strict patient privacy protection, and the scarcity of data for certain diseases. Therefore, data augmentation techniques have become an important means to solve this problem. - **Data Augmentation Techniques**: Traditional data augmentation methods usually transform the original images by means of rotation, scaling, changing brightness, etc. While data augmentation based on GAN increases the quantity of training data by generating new synthetic images. ### Research Objectives - **Verify the Effect of GAN - based Augmentation**: The researchers hope to verify through experiments whether the data augmentation method based on GAN can be comparable to traditional data augmentation methods in terms of performance on small - sample datasets. - **Explore the Influence of Dataset Size**: The researchers also hope to explore the influence of dataset size on classification performance, especially on small - sample datasets. ### Experimental Design - **Datasets**: The researchers used two datasets, one containing 1,000 images per class (a total of 4,000 images), and the other containing 500 images per class (a total of 2,000 images). - **Augmentation Methods**: Three data augmentation methods were compared: no augmentation, traditional augmentation, and GAN - based augmentation. - **Evaluation Metrics**: Multiple classification quality metrics such as Accuracy, Precision, Recall, F1 - score, Specificity, and Matthew’s Correlation Coefficient (MCC) were used to evaluate the performance of the model. ### Main Findings - **Small - sample Datasets**: On the small - sample dataset with 500 images per class, the traditional augmentation method outperformed the GAN - based augmentation method. - **Medium - and Large - sized Datasets**: On the medium - sized dataset with 1,000 images per class, the GAN - based augmentation method performed comparably to the traditional augmentation method. - **Overall Trend**: Regardless of which augmentation method is used, as the dataset size increases, the classification performance generally improves. ### Conclusions - **Limitations of GAN - based Augmentation**: On small - sample datasets, the GAN - based augmentation method fails to significantly improve classification performance and requires more computational resources and time. - **Future Directions**: Nevertheless, the performance of the GAN - based augmentation method on medium - and large - sized datasets is comparable to that of traditional methods, indicating its potential application value in future medical data sharing and privacy protection. Through these studies, the authors hope to provide more effective data augmentation strategies for medical image classification tasks and further promote the application of deep learning in the medical field.

Additional Look into GAN-based Augmentation for Deep Learning COVID-19 Image Classification

Performance of GAN-based augmentation for deep learning COVID-19 image classification

CovidGAN: Data Augmentation Using Auxiliary Classifier GAN for Improved Covid-19 Detection

Data Augmentation Using Generative Adversarial Networks (GANs) For GAN-Based Detection Of Pneumonia And COVID-19 In Chest X-Ray Images

GAN Augmentation: Augmenting Training Data using Generative Adversarial Networks

COVID-19 Classification Using Medical Image Synthesis by Generative Adversarial Networks

Generative Adversarial Networks for Data Augmentation

The use of generative adversarial networks in medical image augmentation

DACov: A Deeper Analysis of Data Augmentation on the Computed Tomography Segmentation Problem

Learning More with Less: GAN-based Medical Image Augmentation

Generative Adversarial Networks in Medical Image augmentation: A review

SAG-GAN: Semi-Supervised Attention-Guided GANs for Data Augmentation on Medical Images

Data Augmentation For Medical MR Image Using Generative Adversarial Networks

Performance Study of Image Data Augmentation by Generative Adversarial Networks

Leveraging GANs for data scarcity of COVID-19: Beyond the hype

Evaluating the Performance of StyleGAN2-ADA on Medical Images

Data Augmentation for Cardiac Magnetic Resonance Image Using Evolutionary GAN

Combating COVID-19 Using Generative Adversarial Networks and Artificial Intelligence for Medical Images: Scoping Review

Generating Realistic COVID19 X-rays with a Mean Teacher + Transfer Learning GAN

Cross-Modality Synthetic Data Augmentation using GANs: Enhancing Brain MRI and Chest X-ray Classification

A Critical Assessment of Generative Models for Synthetic Data Augmentation on Limited Pneumonia X-ray Data