Identification of hub genes and their correlation with immune infiltration in coronary artery disease through bioinformatics and machine learning methods
Ke-Ke Huang,Hui-Lei Zheng,Shuo Li,Zhi-Yu Zeng
DOI: https://doi.org/10.21037/jtd-22-632
Abstract:Background: Coronary artery disease (CAD) is a multifactorial disease and its pathogenesis remains unclear. We aimed to explore the optimal feature genes (OFGs) for CAD and to investigate the function of immune cell infiltration of CAD. It will be helpful for better understanding of the pathogenesis and the development of genetic prediction of CAD. Methods: Datasets related to CAD were obtained from the Gene Expression Omnibus (GEO) database. Cases from the datasets met diagnostic criteria including clinical symptoms, electrocardiographic (ECG) and angiographic evidence. We identified differentially expressed genes (DEGs) and conducted functional enrichment analysis. OFGs were obtained from the least absolute shrinkage and selection operator (LASSO) algorithm, support vector machine recursive feature elimination (SVM-RFE) algorithm, and random forest (RF) algorithm. CIBERSORT was used to compare immune infiltration between CAD patients and normal controls, and the correlation between OFGs and immune cells was analyzed. Results: DEGs were involved in the interleukin (IL)-17 signaling pathway, nuclear factor (NF)-kappa B signaling pathway, and tumor necrosis factor (TNF) signaling pathway. Gene Ontology (GO) analysis revealed DEGs were enriched in lipopolysaccharide (LPS), tertiary granule, and pattern recognition receptor activity. Disease Ontology (DO) analysis suggested DEGs were enriched in lung disease, arteriosclerotic cardiovascular disease (CVD). Matrix metalloproteinase 9 (MMP9), Pellino E3 ubiquitin protein ligase 1 (PELI1), thrombomodulin (THBD), and zinc finger protein 36 (ZFP36) were screened by the intersection of OFGs obtained from LASSO, SVM-REF, and RF algorithms. CAD patients had a lower proportion of memory B cells (P=0.019), CD8 T cells (P<0.001), resting memory CD4 T cells (P<0.001), regulatory T cells (P=0.028), and gamma delta T cells (P<0.001) than normal controls, while the proportion of activated memory CD4 T cells (P=0.014), resting natural killer (NK) cells (P<0.001), monocytes (P<0.001), M0 macrophages (P=0.023), activated mast cells (P<0.001), and neutrophils (P<0.001) in CAD patients were higher than normal controls. MMP9, PELI1, THBD, and ZFP36 were correlated with immune cells. Conclusions: MMP9, PELI1, THBD, and ZFP36 may be predicted biomarkers for CAD. The OFGs and association between OFGs and immune infiltration may provide potential biomarkers for CAD prediction along with the better assessment of the disease.
What problem does this paper attempt to address?