Identification of diagnostic biomarkers and molecular subtype analysis associated with m6A in Tuberculosis immunopathology using machine learning

Shoupeng Ding,Jinghua Gao,Chunxiao Huang,Yuyang Zhou,Yimei Yang,Zihan Cai
DOI: https://doi.org/10.1038/s41598-024-81790-4
IF: 4.6
2024-12-04
Scientific Reports
Abstract:Tuberculosis (TB), ranking just below COVID-19 in global mortality, is a highly complex infectious disease involving intricate immunological molecules, diverse signaling pathways, and multifaceted immune processes. N6-methyladenosine (m6A), a critical epigenetic modification, regulates various immune-metabolic and pathological pathways, though its precise role in TB pathogenesis remains largely unexplored. This study aims to identify m6A-associated genes implicated in TB, elucidate their mechanistic contributions, and evaluate their potential as diagnostic biomarkers and tools for molecular subtyping. Using TB-related datasets from the GEO database, this study identified differentially expressed genes associated with m6A modification. We applied four machine learning algorithms—Random Forest, Support Vector Machine, Extreme Gradient Boosting, and Generalized Linear Model—to construct diagnostic models focusing on m6A regulatory genes. The Random Forest algorithm was selected as the optimal model based on performance metrics (area under the curve [AUC] = 1.0, p < 0.01), and a clinical predictive model was developed based on these critical genes. Patients were stratified into distinct subtypes according to m6A gene expression profiles, followed by immune infiltration analysis across subtypes. Additionally, Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analyses elucidated the biological functions and pathways associated with the identified genes. Quantitative real-time PCR (RT-qPCR) was used to validate the expression of key m6A regulatory genes. Analysis of the GSE83456 dataset revealed four differentially expressed m6A-related genes—YTHDF1, HNRNPC, LRPPRC, and ELAVL1—identified as critical m6A regulators in TB through the Random Forest model. The diagnostic significance of these genes was further supported by a nomogram, achieving a high predictive accuracy (95% confidence interval [CI]: 0.87–0.94). Consensus clustering classified patients into two m6A subtypes with distinct immune profiles, as principal component analysis (PCA) showed significantly higher m6A scores in Group A than in Group B ( p < 0.05). Immune infiltration analysis highlighted significant correlations between key m6A genes and specific immune cell infiltration patterns across subtypes. This study highlights the potential of key m6A regulatory genes as diagnostic biomarkers and immunotherapy targets for TB, supporting their role in TB pathogenesis. Future research should aim to further validate these findings across diverse cohorts to enhance their clinical applicability.
multidisciplinary sciences
What problem does this paper attempt to address?