Abstract:Background: Lung adenocarcinoma (LUAD) is a subtype of lung cancer with high morbidity and mortality. While genotyping is an important determinant for the prognosis of LUAD patients, there is a paucity of studies on gene set-based expression (GSE) typing for LUAD. This current study used GSE methodology to perform gene typing of LUAD patients. Methods: Clinical and genomic information of the LUAD patients were downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Patients with LUAD were clustered into different molecular subtypes depending on the clinical and gene set expression characteristics. The survival rate and silhouette widths were compared between each molecular subtype. Differences in survival rate between gene sets were analyzed using Kaplan-Meier survival curves. Cox regression and Lasso regression were used to establish the prognostic gene set model based on the TCGA database, and the results were validated using the GEO dataset. Results: A total of 10 hub genes were finally identified and clustered into 3 subtypes with a mean contour width of 0.96. There were significant differences in survival rates among the 3 subtypes (P<0.05). Gene Ontology (GO) analysis indicated that the related biological processes (BP) were mainly involved in regulation of cell cycle, mitotic cell cycle phase transition, and proteasome-mediated ubiquitin-dependent protein catabolic process. The cellular components (CC) were related to the spindle, chromosomal region, and midbody. Molecular function (MF) mainly focused on ubiquitin-like protein ligase binding, translation regulator activity, and oxidation activity. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that the main pathways included the Epstein Barr virus infection pathway of neurogeneration, the p53 signaling pathway, and the proteome pathways. In addition, the protein-protein interaction network was analyzed using the STRING and Cytospace software, and the top 9 hub genes identified were KIF2C, DLGAP5, KIF20A, PSMC1, PSMD1, PSMB7, SNAI2, FGF13, and BMP2. Conclusions: Patients with LUAD can be clustered into three subtypes based on the expression of gene sets. These findings contribute to understanding the pathogenesis and molecular mechanisms in LUAD, and may lead to potential individualized pharmacogenetic therapy for patients with LUAD.

Probabilistic Lung Cancer Models Conditioned on Gene Expression Microarray Data

LUADpp: an Effective Prediction Model on Prognosis of Lung Adenocarcinomas Based on Somatic Mutational Features

Identification of Genes Associated with Lung Adenocarcinoma Prognosis

Survival ensembles by the sum of pairwise differences with application to lung cancer microarray studies

Cancer prediction with gene expression profiling and differential evolution

A large cohort study identifying a novel prognosis prediction model for lung adenocarcinoma through machine learning strategies

Integrating genetic mutations and expression profiles for survival prediction of lung adenocarcinoma

Integrating Biological Knowledge with Gene Expression Profiles for Survival Prediction of Cancer

Analysis of prognostic model based on immunotherapy related genes in lung adenocarcinoma

An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling

Histopathological imaging features- versus molecular measurements-based cancer prognosis modeling

A Prediction Model for Lung Cancer Diagnosis that Integrates Genomic and Clinical Features

Machine-learning and scRNA-Seq-based diagnostic and prognostic models illustrating survival and therapy response of lung adenocarcinoma

A prognostic model for the combined analysis of gene expression profiling in hepatocellular carcinoma

Molecular clustering based on gene set expression and its relationship with prognosis in patients with lung adenocarcinoma

The Combined Detection of Immune Genes for Predicting the Prognosis of Patients With Non-Small Cell Lung Cancer

Cancer adjuvant chemotherapy prediction model for non‐small cell lung cancer

Applying machine learning algorithms to develop a survival prediction model for lung adenocarcinoma based on genes related to fatty acid metabolism

A novel 14-gene signature for overall survival in lung adenocarcinoma based on the Bayesian hierarchical Cox proportional hazards model

Gene selection and classification for cancer microarray data based on machine learning and similarity measures