Systematic characterization and efficient prediction of cobalamin C deficiency clinical phenotypes using network analysis and deep learning on multi-omics data
Ze-Yu Li,Xiao-Ying Liu,Wen Xiao,Jiang-Tao Yang,Pan-Pan Jiang,Ben-Qing Wu,Xiang-Ju Liu,Ming Xue,Hui-Jing Lv,Shi-Hao Zhou,Qin Yang,Lu Xu,Yan-Ling Yang
DOI: https://doi.org/10.1016/j.microc.2024.112018
IF: 5.304
2024-11-02
Microchemical Journal
Abstract:As a monogenic disease, cobalamin C (cblC) deficiency lacks a clear correlation between gene pathogenic mutations and its spectrum of disease phenotypes, necessitating the understanding of molecular mechanisms how diverse clinical phenotypes emerge. This work aimed to disentangle the phenotypic complexity of cblC deficiency via network analysis and deep learning on multi-omics (proteomics and metabolomics) data. For this purpose, a novel computational framework was developed to systematically characterize and efficiently predict clinical phenotypes of cblC deficiency utilizing a Connect the Dots (CTD)-based Hybrid data Structural Representation of each patient and graph convolutional network (GCN)-based Multi-Omics Learning (CTDHSR-GCNMOL). CTD algorithm enabled the identification of relevant perturbed proteins or metabolites and the construction of clinical phenotype-specific co-perturbation network. GCN allowed efficient learning of subtle change patterns across clinical phenotypes not only depending on the hybrid feature description (Euclidean structure hybridized with non-Euclidean structure) of each patient but on the interaction exploration between patients offered by sample similarity network. Investigated by three clinical phenotypes (epilepsy, developmental delay and metabolic syndrome), the results showed that CTDHSR-GCNMOL identified the subsets of perturbed proteins or metabolites highly specific to each clinical phenotype and established each main disease module (network) for systematic characterization. For proteomics, epilepsy was characterized by the dysregulation of reported TAGLN2, SH3BGRL3 and LTA4H, and developmental delay was characterized by the dysregulation of reported HSP90AB1, PRDX1, GDI2, VIM, PNP and BLVRA with high confidence (selection frequencies). For untargeted metabolomics in negative ion mode, the disease status of metabolic syndrome could be well interpreted by the disturbed pathways of the top-ranked 20 perturbed metabolites all of which have been reported to be closely related with its pathogenesis in previous studies. These disturbed pathways involved butanoate metabolism, purine metabolism, alanine, aspartate and glutamate metabolism, pyrimidine metabolism, fructose and mannose metabolism, galactose metabolism, amino sugar and nucleotide sugar metabolism, and steroid hormone biosynthesis. Based on the hybridization of the abundances of perturbed proteins and metabolites with the topological structures of patient-specific perturbation network, CTDHSR-GCNMOL yielded desired prediction performance across three clinical phenotypes and outperformed the traditional block PLSDA. All these findings verified the effectiveness of CTDHSR-GCNMOL in gaining useful insights into the phenotypic complexity of cblC deficiency and guiding its targeted treatment strategies.
chemistry, analytical