Deep adversarial data augmentation for biomedical spectroscopy: Application to modelling Raman spectra of bone

Eleftherios Pavlou,Nikolaos Kourkoumelis
DOI: https://doi.org/10.1016/j.chemolab.2022.104634
IF: 4.175
2022-09-15
Chemometrics and Intelligent Laboratory Systems
Abstract:Deep learning algorithms have performed remarkably well to predict state of health. Nevertheless, they typically rely on ample training data to avoid overfitting. In the biomedical sector, sufficient data are not typically available due to low availability or accessibility. Data augmentation of physiological recordings can be achieved using Generative Adversarial Networks (GAN). GAN is a computational framework for approximating generative models within an adversarial process, where two neural networks compete against one other while being trained simultaneously. Despite the widespread use and adoption of deep learning algorithms in life sciences, concerns have been raised about the lack of biological context. Therefore, to assess a data augmentation workflow, both computational and physiological quality metrics must be considered. Raman spectroscopy can be effectively used to study the molecular properties of bone tissue. Both inorganic and organic phases can be analysed simultaneously as probes of bone health status. In this work, we describe an easy-to-follow GAN approach for generating synthetic Raman spectra from a small dataset of ex vivo healthy and osteoporotic bone samples. The model was applied to raw Raman spectra, while it can be modified accordingly to produce any one-dimensional biomedical signal. We also introduced a novel unsupervised methodology to evaluate the variability of the synthetic dataset, based on successive Principal Component Analysis (PCA) modelling. The properties of the synthetic spectra were scrutinized by Fréchet Distance and difference spectroscopy, as well as by bone quality metrics, like mineral-to-matrix ratio and crystallinity. Finally, classification studies demonstrated the increased discrimination accuracy of the augmented dataset.
automation & control systems,computer science, artificial intelligence,instruments & instrumentation,statistics & probability,mathematics, interdisciplinary applications,chemistry, analytical
What problem does this paper attempt to address?