Robust classification and biomarker discovery of inherited metabolic diseases using GC-MS urinary metabolomics analysis combined with chemometrics

Nan Chen,Si Chen,Qi Zhang,Si-Rui Wang,Li-Juan Tang,Jian-Hui Jiang,Ru-Qin Yu,Yan-Ping Zhou
DOI: https://doi.org/10.1016/j.microc.2023.108600
IF: 5.304
2023-03-08
Microchemical Journal
Abstract:Early diagnosis and treatment of inherited metabolic diseases (IMDs) is crucial for reducing neonatal mortality rate and improving quality of life in children. The discovery of disease-related biomarkers that can objectively measure the potential pathophysiological changes is vital to improve the prognosis of IMDs. In this study, we collected 90 clinic urine samples of newborns, including two types of IMDs and healthy samples, glutaric aciduria type I (GA I) and propionic acidemia (PA). And 132 metabolites were identified using gas chromatography-mass spectrometry (GC-MS). Then we proposed an integrated chemometrics strategy of assembling discrete particle swarm optimization (DPSO) into stacked autoencoder (SAE) to form a framework called DPSO-SAE for the study of GC-MS metabolomics data. SAE was known for its excellent non-linear feature learning ability. The introduction of DPSO afforded SAE the possibility of biomarker discovery and improving performance on classification via enabling synergetic optimization of variable combinations and the parameter of neuron numbers for SAE modeling. We then invoked DPSO-SAE for the data analysis as compared with random forest (RF), partial least squares discriminant analysis (PLSDA) and conventional SAE. Superior performance was obtained by DPSO-SAE with high accuracy and good generalization ability on classification. We further demonstrated the robustness of DPSO-SAE in variable selection and proofed the statistical significance of identified marker metabolites that account for IMD classification. Six potential biomarkers were proofed, including 3-methylglutaconic, 3-OH-propionic, Methylcitric, Methylmalonic and Uric for PA and Glutaric for GA I. All results indicated that the proposed strategy of DPSO-SAE was feasible for robust classification and biomarker discovery of IMDs. And it may provide a valuable modeling algorithm for metabolomics studies.
chemistry, analytical
What problem does this paper attempt to address?