Identification and Validation of Two Lung Adenocarcinoma-Development Characteristic Gene Sets for Diagnosing Lung Adenocarcinoma and Predicting Prognosis

Cheng Liu,Xiang Li,Hua Shao,Dan Li
DOI: https://doi.org/10.3389/fgene.2020.565206
IF: 3.7
2020-12-21
Frontiers in Genetics
Abstract:Background : Lung adenocarcinoma (LUAD) is one of the main types of lung cancer. Because of its low early diagnosis rate, poor late prognosis, and high mortality, it is of great significance to find biomarkers for diagnosis and prognosis. Methods : Five hundred and twelve LUADs from The Cancer Genome Atlas were used for differential expression analysis and short time-series expression miner (STEM) analysis to identify the LUAD-development characteristic genes. Survival analysis was used to identify the LUAD-unfavorable genes and LUAD-favorable genes. Gene set variation analysis (GSVA) was used to score individual samples against the two gene sets. Receiver operating characteristic (ROC) curve analysis and univariate and multivariate Cox regression analysis were used to explore the diagnostic and prognostic ability of the two GSVA score systems. Two independent data sets from Gene Expression Omnibus (GEO) were used for verifying the results. Functional enrichment analysis was used to explore the potential biological functions of LUAD-unfavorable genes. Results : With the development of LUAD, 185 differentially expressed genes (DEGs) were gradually upregulated, of which 84 genes were associated with LUAD survival and named as LUAD-unfavorable gene set. While 237 DEGs were gradually downregulated, of which 39 genes were associated with LUAD survival and named as LUAD-favorable gene set. ROC curve analysis and univariate/multivariate Cox proportional hazards analyses indicated both of LUAD-unfavorable GSVA score and LUAD-favorable GSVA score were a biomarker of LUAD. Moreover, both of these two GSVA score systems were an independent factor for LUAD prognosis. The LUAD-unfavorable genes were significantly involved in p53 signaling pathway, Oocyte meiosis, and Cell cycle. Conclusion : We identified and validated two LUAD-development characteristic gene sets that not only have diagnostic value but also prognostic value. It may provide new insight for further research on LUAD.
genetics & heredity
What problem does this paper attempt to address?
The main problems that this paper attempts to solve are to identify and verify two gene sets related to the development characteristics of lung adenocarcinoma (LUAD). These gene sets have not only diagnostic value but also the value of predicting the prognosis of LUAD. Specifically, researchers hope to find genes that are gradually up - regulated or down - regulated during the development of LUAD by analyzing LUAD samples at different stages, and further determine whether these genes are related to the survival rate of LUAD patients. Through this process, researchers hope to find new biomarkers to improve the early diagnosis rate of LUAD and the accuracy of prognosis evaluation. ### Research Background Lung adenocarcinoma (LUAD) is one of the most common types of lung cancer. Due to its low early diagnosis rate, poor prognosis in the advanced stage and high mortality rate, it is of great significance to find biomarkers that can be used for diagnosis and prognosis. Existing studies have shown that the abnormal expression of certain genes is closely related to the development of LUAD, but most studies do not consider the changes of multiple genes and their synergistic effects at the same time. ### Research Methods 1. **Data Sources**: The study used 512 LUAD samples from The Cancer Genome Atlas (TCGA) for differential expression analysis and short - time - series expression mining (STEM) to identify LUAD development - characteristic genes. 2. **Survival Analysis**: Genes related to poor and good prognosis of LUAD were identified through survival analysis. 3. **Gene Set Variation Analysis (GSVA)**: GSVA was used to score each sample to evaluate the response of individual samples to these two gene sets. 4. **ROC Curve Analysis**: The diagnostic capabilities of these two GSVA scoring systems were evaluated through receiver operating characteristic (ROC) curve analysis. 5. **Cox Regression Analysis**: The prognostic capabilities of these two GSVA scoring systems were evaluated through univariate and multivariate Cox proportional - hazard models. 6. **Functional Enrichment Analysis**: The potential biological functions of LUAD - related poor - prognosis genes were explored through Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway enrichment analysis. ### Main Results 1. **Identification of Gene Sets**: The study found that as LUAD develops, 185 genes are gradually up - regulated, of which 84 genes are related to LUAD survival and are named as the LUAD - related poor - prognosis gene set; 237 genes are gradually down - regulated, of which 39 genes are related to LUAD survival and are named as the LUAD - related good - prognosis gene set. 2. **Diagnostic and Prognostic Capabilities**: ROC curve analysis and Cox regression analysis show that both the LUAD - related poor - prognosis GSVA score and the LUAD - related good - prognosis GSVA score are reliable biomarkers for LUAD and are independent prognostic factors. 3. **Functional Enrichment Analysis**: LUAD - related poor - prognosis genes are significantly involved in biological processes such as the p53 signaling pathway, oocyte meiosis and the cell cycle. ### Conclusion The study successfully identified and verified two gene sets related to the development characteristics of LUAD. These gene sets have both diagnostic and prognostic values, providing a new perspective for further research on LUAD.