Abstract:Making clinical decisions based on medical images is fundamentally an exercise in statistical decision-making. This is because in this case, the decision-maker must distinguish between image features that are clinically diagnostic (i.e., signal) from a large amount of non-diagnostic features. (i.e., noise). To perform this task, the decision-maker must have learned the underlying statistical distributions of the signal and noise to begin with. The same is true for machine learning algorithms that perform a given diagnostic task. In order to train and test human experts or expert machine systems in any diagnostic or analytical task, it is advisable to use large sets of images, so as to capture the underlying statistical distributions adequately. Large numbers of images are also useful in clinical and scientific research about the underlying diagnostic process, which remains poorly understood. Unfortunately, it is often difficult to obtain medical images of given specific descriptions in sufficiently large numbers. This represents a significant barrier to progress in the arenas of clinical care, education, and research. Here we describe a novel methodology that helps overcome this barrier. This method leverages the burgeoning technologies of deep learning (DL) and deep synthesis (DS) to synthesize medical images de novo. We provide a proof-of-principle of this approach using mammograms as an illustrative case. During the initial, prerequisite DL phase of the study, we trained a publicly available deep learning neural network (DNN), using open-sourced, radiologically vetted mammograms as labeled examples. During the subsequent DS phase of the study, the fully trained DNN was made to synthesize, de novo, images that capture the image statistics of a given input image. The resulting images indicated that our DNN was able to faithfully capture the image statistics of visually diverse sets of mammograms. We also briefly outline rigorous psychophysical testing methods to measure the extent to which synthesized mammography were sufficiently alike their original counterparts to human experts. These tests reveal that mammography experts fail to distinguish synthesized mammograms from their original counterparts at a statistically significant level, suggesting that the synthesized images were sufficiently realistic. Taken together, these results demonstrate that deep synthesis has the potential to be impactful in all fields in which medical images play a key role, most notably in radiology and pathology.

Synthetic Simplicity: Unveiling Bias in Medical Data Augmentation

Synthetic Data in Healthcare

Transitioning from Real to Synthetic data: Quantifying the bias in model

Bias Mitigation via Synthetic Data Generation: A Review

Synthetically Enhanced: Unveiling Synthetic Data's Potential in Medical Imaging Research

Generating high-fidelity synthetic patient data for assessing machine learning healthcare software

Downstream Fairness Caveats with Synthetic Healthcare Data

Harnessing the power of synthetic data in healthcare: innovation, application, and privacy

Augmenting medical image classifiers with synthetic data from latent diffusion models

Enhancing Image Classification in Small and Unbalanced Datasets through Synthetic Data Augmentation

Evaluating Synthetic Data Augmentation to Correct for Data Imbalance in Realistic Clinical Prediction Settings

Deep Synthesis of Realistic Medical Images: A Novel Tool in Clinical Research and Training

The Beauty or the Beast: Which Aspect of Synthetic Medical Images Deserves Our Focus?

Synthetic Data in Radiological Imaging: Current State and Future Outlook

Synthetic data in biomedicine via generative artificial intelligence

Shortcut learning in medical AI hinders generalization: method for estimating AI model generalization without external data

Identifying Spurious Biases Early in Training through the Lens of Simplicity Bias

Generating Synthetic Data for Medical Imaging

Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images

On the notion of Hallucinations from the lens of Bias and Validity in Synthetic CXR Images

Non-Imaging Medical Data Synthesis for Trustworthy AI: A Comprehensive Survey