Improving classification results on a small medical dataset using a GAN; An outlook for dealing with rare disease datasets

Julia Röglin,Katharina Ziegeler,Jana Kube,Franziska König,Kay-Geert Hermann,Steffen Ortmann
DOI: https://doi.org/10.3389/fcomp.2022.858874
2022-08-25
Frontiers in Computer Science
Abstract:For clinical decision support systems, automated classification algorithms on medical image data have become more important in the past. For such computer vision problems, deep convolutional neural networks (DCNNs) have made breakthroughs. These often require large, annotated, and privacy-cleared datasets as a prerequisite for gaining high-quality results. This proves to be difficult with rare diseases due to limited incidences. Therefore, it is hard to sensitize clinical decision support systems to identify these diseases at an early stage. It has been shown several times, that synthetic data can improve the results of clinical decision support systems. At the same time, the greatest problem for the generation of these synthetic images is the data basis. In this paper, we present four different methods to generate synthetic data from a small dataset. The images are from 2D magnetic resonance tomography of the spine. The annotation resulted in 540 healthy, 47 conspicuously non-pathological, and 106 conspicuously pathological vertebrae. Four methods are presented to obtain optimal generation results in each of these classes. The obtained generation results are then evaluated with a classification net. With this procedure, we showed that adding synthetic annotated data has a positive impact on the classification results of the original data. In addition, one of our methods is appropriate to generate synthetic image data from <50 images. Thus, we found a general approach for dealing with small datasets in rare diseases, which can be used to build sensitized clinical decision support systems to detect and treat these diseases at an early stage.
What problem does this paper attempt to address?