Generation of synthetic data using breast cancer dataset and classification with resnet18

Dilsat Berin Aytar,Semra Gunduc

DOI: https://doi.org/10.48550/arXiv.2405.16286

2024-05-25

Abstract:Since technology is advancing so quickly in the modern era of information, data is becoming an essential resource in many fields. Correct data collection, organization, and analysis make it a potent tool for successful decision-making, process improvement, and success across a wide range of sectors. Synthetic data is required for a number of reasons, including the constraints of real data, the expense of collecting labeled data, and privacy and security problems in specific situations and domains. For a variety of reasons, including security, ethics, legal restrictions, sensitivity and privacy issues, and ethics, synthetic data is a valuable tool, particularly in the health sector. A deep learning model called GAN (Generative Adversarial Networks) has been developed with the intention of generating synthetic data. In this study, the Breast Histopathology dataset was used to generate malignant and negatively labeled synthetic patch images using MSG-GAN (Multi-Scale Gradients for Generative Adversarial Networks), a form of GAN, to aid in cancer identification. After that, the ResNet18 model was used to classify both synthetic and real data via Transfer Learning. Following the investigation, an attempt was made to ascertain whether the synthetic images behaved like the real data or if they are comparable to the original data.

Machine Learning

What problem does this paper attempt to address?

The paper aims to address the issues of data insufficiency and privacy protection in breast cancer pathology image datasets by generating synthetic data to assist in cancer recognition. Specifically, the study utilizes MSG-GAN (a variant of Generative Adversarial Networks) to generate synthetic images labeled as malignant (IDC+) and non-malignant (IDC-) from a breast cancer histopathology dataset. After generating these synthetic images, a pre-trained ResNet18 model is used to classify both synthetic and real data through transfer learning. The main objectives of the study include: 1. **Generate high-fidelity synthetic images**: Use MSG-GAN to generate synthetic breast cancer pathology images that are highly similar to real images, overcoming the limitations and privacy issues of real data. 2. **Evaluate the quality of synthetic images**: Classify the generated synthetic images using the ResNet18 model to verify whether the synthetic images can mimic the behavior of real data and perform well in classification tasks. 3. **Improve the accuracy of cancer recognition**: Enhance the diversity and richness of the dataset by generating more synthetic data, thereby improving the model's performance in practical applications. The paper validates the quality of synthetic data through four different classification experiments and evaluates the classification results using metrics such as accuracy, precision, recall, and F1 score. The results show that when synthetic data is used as the training set, the model can learn the data distribution well; and when real data is used for testing, despite some differences, the synthetic data can still simulate real data relatively well. This indicates that synthetic data can, to some extent, replace real data for training and classification tasks.

Generation of synthetic data using breast cancer dataset and classification with resnet18

A Comparative Analysis of the Novel Conditional Deep Convolutional Neural Network Model, Using Conditional Deep Convolutional Generative Adversarial Network-Generated Synthetic and Augmented Brain Tumor Datasets for Image Classification

Enhancing Histopathological Image Classification Performance through Synthetic Data Generation with Generative Adversarial Networks

Reliable Breast Cancer Diagnosis with Deep Learning: DCGAN-Driven Mammogram Synthesis and Validity Assessment

Prior-guided generative adversarial network for mammogram synthesis

Cross-Modality Synthetic Data Augmentation using GANs: Enhancing Brain MRI and Chest X-ray Classification

Synthesizing lesions using contextual GANs improves breast cancer classification on mammograms

Synthesis of diagnostic quality cancer pathology images by generative adversarial networks

Synthetic Boosted Resampling Using Deep Generative Adversarial Networks: A Novel Approach to Improve Cancer Prediction from Imbalanced Datasets

Skin Lesion Synthesis and Classification Using an Improved DCGAN Classifier

Synthetic Genitourinary Image Synthesis via Generative Adversarial Networks: Enhancing Artificial Intelligence Diagnostic Precision

Unleashing the Potential of Synthetic Images: A Study on Histopathology Image Classification

Synthetic Genitourinary Image Synthesis via Generative Adversarial Networks: Enhancing AI Diagnostic Precision

Hybrid Deep Learning Approach for Accurate Tumor Detection in Medical Imaging Data

How Good Are Synthetic Medical Images? An Empirical Study with Lung Ultrasound

Conditional Infilling GANs for Data Augmentation in Mammogram Classification

Improving classification results on a small medical dataset using a GAN; An outlook for dealing with rare disease datasets

Cancer-Net SCa-Synth: An Open Access Synthetically Generated 2D Skin Lesion Dataset for Skin Cancer Classification

Could We Generate Cytology Images from Histopathology Images? An Empirical Study

Synthetic Generation of Dermatoscopic Images with GAN and Closed-Form Factorization

Synthetic Histology Images for Training AI Models: A Novel Approach to Improve Prostate Cancer Diagnosis