Identification of potential feature genes in non-alcoholic fatty liver disease using bioinformatics analysis and machine learning strategies
Zhaohui Zhang,Shihao Wang,Zhengwen Zhu,Biao Nie
DOI: https://doi.org/10.1016/j.compbiomed.2023.106724
Abstract:The prevalence of non-alcoholic fatty liver disease (NAFLD) and NAFLD-associated hepatocellular carcinoma (HCC) has continuously increased in recent years. Machine learning is an effective method for screening the feature genes of a disease for prediction, prevention and personalized treatment. Here, we used the "limma" package and weighted gene co-expression network analysis (WGCNA) to screen 219 NAFLD-related genes and found that they were mainly enriched in inflammation-related pathways. Four feature genes (AXUD1, FOSB, GADD45B, and SOCS2) were screened by LASSO regression and support vector machine-recursive feature elimination (SVM-RFE) machine learning algorithms. Therefore, a clinical diagnostic model with an area under the curve (AUC) value of 0.994 was constructed, which was superior to other indicators of NAFLD. Significant correlations existed between feature genes expression and steatohepatitis histology or clinical variables. These findings were also validated in external datasets and a mouse model. Finally, we found that feature genes expression was significantly decreased in NAFLD-associated HCC and that SOCS2 may be a prognostic biomarker. Our findings may provide new insights into the diagnosis, prevention and treatment targets of NAFLD and NAFLD-associated HCC.
What problem does this paper attempt to address?