Machine learning-based prediction of candidate gene biomarkers correlated with immune infiltration in patients with idiopathic pulmonary fibrosis
Yufeng Zhang,Cong Wang,Qingqing Xia,Weilong Jiang,Huizhe Zhang,Ehsan Amiri-Ardekani,Haibing Hua,Yi Cheng
DOI: https://doi.org/10.3389/fmed.2023.1001813
IF: 3.9
2023-02-13
Frontiers in Medicine
Abstract:Objective This study aimed to identify candidate gene biomarkers associated with immune infiltration in idiopathic pulmonary fibrosis (IPF) based on machine learning algorithms. Methods Microarray datasets of IPF were extracted from the Gene Expression Omnibus (GEO) database to screen for differentially expressed genes (DEGs). The DEGs were subjected to enrichment analysis, and two machine learning algorithms were used to identify candidate genes associated with IPF. These genes were verified in a validation cohort from the GEO database. Receiver operating characteristic (ROC) curves were plotted to assess the predictive value of the IPF-associated genes. The cell-type identification by estimating relative subsets of RNA transcripts (CIBERSORT) algorithm was used to evaluate the proportion of immune cells in IPF and normal tissues. Additionally, the correlation between the expression of IPF-associated genes and the infiltration levels of immune cells was examined. Results A total of 302 upregulated and 192 downregulated genes were identified. Functional annotation, pathway enrichment, Disease Ontology and gene set enrichment analyses revealed that the DEGs were related to the extracellular matrix and immune responses. COL3A1, CDH3, CEBPD, and GPIHBP1 were identified as candidate biomarkers using machine learning algorithms, and their predictive value was verified in a validation cohort. Additionally, ROC analysis revealed that the four genes had high predictive accuracy. The infiltration levels of plasma cells, M0 macrophages and resting dendritic cells were higher and those of resting natural killer (NK) cells, M1 macrophages and eosinophils were lower in the lung tissues of patients with IPF than in those of healthy individuals. The expression of the abovementioned genes was correlated with the infiltration levels of plasma cells, M0 macrophages and eosinophils. Conclusion COL3A1, CDH3, CEBPD, and GPIHBP1 are candidate biomarkers of IPF. Plasma cells, M0 macrophages and eosinophils may be involved in the development of IPF and may serve as immunotherapeutic targets in IPF.
medicine, general & internal
What problem does this paper attempt to address?