Multiple microarray analyses identify key genes associated with the development of Non-Small Cell Lung Cancer from Chronic Obstructive Pulmonary Disease

Lemeng Zhang,Jianhua Chen,Hua Yang,Changqie Pan,Haitao Li,Yongzhong Luo,Tianli Cheng
DOI: https://doi.org/10.7150/jca.51264
IF: 3.9
2021-01-01
Journal of Cancer
Abstract:Introduction: Chronic obstructive pulmonary disease (COPD) is an independent risk factor of non-small cell lung cancer (NSCLC). This study aimed to analyze the key genes and potential molecular mechanisms that are involved in the development from COPD to NSCLC. Methods: Expression profiles of COPD and NSCLC in GSE106899, GSE12472, and GSE12428 were downloaded from the Gene Expression Omnibus (GEO) database, followed by identification of the differentially expressed genes (DEGs) between COPD and NSCLC. Based on the identified DEGs, functional pathway enrichment and lung carcinogenesis-related networks analyses were performed and further visualized with Cytoscape software. Then, principal component analysis (PCA), cluster analysis, and support vector machines (SVM) verified the ability of the top modular genes to distinguish COPD from NSCLC. Additionally, the corrections between these key genes and clinical staging of NSCLC were studied using the UALCAN and HPA websites. Finally, a prognostic risk model was constructed based on multivariate Cox regression analysis. Kaplan-Meier survival curves of the top modular genes on the training and verification sets were generated. Results: A total of 2350, 1914, and 1850 DEGs were obtained from GSE106899, GSE12472, and GSE12428 datasets, respectively. Following analysis of protein-protein interaction networks, the identified modular gene signatures containing H2AFX, MCM2, MCM3, MCM7, POLD1, and RPA1 were identified as markers for discrimination between COPD and NSCLC. The modular gene signatures were mainly enriched in the processes of DNA replication, cell cycle, mismatch repair, and others. Besides, the expression levels of these genes were significantly higher in NSCLC than in COPD, which was further verified by the immunohistochemistry. In addition, the high expression levels of H2AFX, MCM2, MCM7, and POLD1 correlate with poor prognosis of lung adenocarcinoma (LUAD). The Cox regression prognostic risk model showed the similar results and the predictive ability of this model is independent of other clinical variables. Conclusions: This study revealed several key modules that closely relate to NSCLC with underlying disease COPD, which provide a deeper understanding of the potential mechanisms underlying the malignant development from COPD to NSCLC. This study provides valuable prognostic factors in high-risk lung cancer patients with COPD.
What problem does this paper attempt to address?