Molecular clustering based on gene set expression and its relationship with prognosis in patients with lung adenocarcinoma

Baobao Xing,Lei Shi,Zhiguo Bao,Ying Liang,Bo Liu,Ruihan Liu
DOI: https://doi.org/10.21037/jtd-22-557
Abstract:Background: Lung adenocarcinoma (LUAD) is a subtype of lung cancer with high morbidity and mortality. While genotyping is an important determinant for the prognosis of LUAD patients, there is a paucity of studies on gene set-based expression (GSE) typing for LUAD. This current study used GSE methodology to perform gene typing of LUAD patients. Methods: Clinical and genomic information of the LUAD patients were downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Patients with LUAD were clustered into different molecular subtypes depending on the clinical and gene set expression characteristics. The survival rate and silhouette widths were compared between each molecular subtype. Differences in survival rate between gene sets were analyzed using Kaplan-Meier survival curves. Cox regression and Lasso regression were used to establish the prognostic gene set model based on the TCGA database, and the results were validated using the GEO dataset. Results: A total of 10 hub genes were finally identified and clustered into 3 subtypes with a mean contour width of 0.96. There were significant differences in survival rates among the 3 subtypes (P<0.05). Gene Ontology (GO) analysis indicated that the related biological processes (BP) were mainly involved in regulation of cell cycle, mitotic cell cycle phase transition, and proteasome-mediated ubiquitin-dependent protein catabolic process. The cellular components (CC) were related to the spindle, chromosomal region, and midbody. Molecular function (MF) mainly focused on ubiquitin-like protein ligase binding, translation regulator activity, and oxidation activity. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that the main pathways included the Epstein Barr virus infection pathway of neurogeneration, the p53 signaling pathway, and the proteome pathways. In addition, the protein-protein interaction network was analyzed using the STRING and Cytospace software, and the top 9 hub genes identified were KIF2C, DLGAP5, KIF20A, PSMC1, PSMD1, PSMB7, SNAI2, FGF13, and BMP2. Conclusions: Patients with LUAD can be clustered into three subtypes based on the expression of gene sets. These findings contribute to understanding the pathogenesis and molecular mechanisms in LUAD, and may lead to potential individualized pharmacogenetic therapy for patients with LUAD.
What problem does this paper attempt to address?