Integrative analysis identifies oxidative stress biomarkers in non-alcoholic fatty liver disease via machine learning and weighted gene co-expression network analysis
Haining Wang,Wei Cheng,Ping Hu,Tao Ling,Chao Hu,Yongzhen Chen,Yanan Zheng,Junqi Wang,Ting Zhao,Qiang You
DOI: https://doi.org/10.3389/fimmu.2024.1335112
IF: 7.3
2024-02-28
Frontiers in Immunology
Abstract:Background: Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease globally, with the potential to progress to non-alcoholic steatohepatitis (NASH), cirrhosis, and even hepatocellular carcinoma. Given the absence of effective treatments to halt its progression, novel molecular approaches to the NAFLD diagnosis and treatment are of paramount importance. Methods: Firstly, we downloaded oxidative stress-related genes from the GeneCards database and retrieved NAFLD-related datasets from the GEO database. Using the Limma R package and WGCNA, we identified differentially expressed genes closely associated with NAFLD. In our study, we identified 31 intersection genes by analyzing the intersection among oxidative stress-related genes, NAFLD-related genes, and genes closely associated with NAFLD as identified through Weighted Gene Co-expression Network Analysis (WGCNA). In a study of 31 intersection genes between NAFLD and Oxidative Stress (OS), we identified three hub genes using three machine learning algorithms: Least Absolute Shrinkage and Selection Operator (LASSO) regression, Support Vector Machine - Recursive Feature Elimination (SVM-RFE), and RandomForest. Subsequently, a nomogram was utilized to predict the incidence of NAFLD. The CIBERSORT algorithm was employed for immune infiltration analysis, single sample Gene Set Enrichment Analysis (ssGSEA) for functional enrichment analysis, and Protein-Protein Interaction (PPI) networks to explore the relationships between the three hub genes and other intersecting genes of NAFLD and OS. The distribution of these three hub genes across six cell clusters was determined using single-cell RNA sequencing. Finally, utilizing relevant data from the Attie Lab Diabetes Database, and liver tissues from NASH mouse model, Western Blot (WB) and Reverse Transcription Quantitative Polymerase Chain Reaction (RT-qPCR) assays were conducted, this further validated the significant roles of CDKN1B and TFAM in NAFLD. Results: In the course of this research, we identified 31 genes with a strong association with oxidative stress in NAFLD. Subsequent machine learning analysis and external validation pinpointed two genes: CDKN1B and TFAM, as demonstrating the closest correlation to oxidative stress in NAFLD. Conclusion: This investigation found two hub genes that hold potential as novel targets for the diagnosis and treatment of NAFLD, thereby offering innovative perspectives for its clinical management.
immunology