Abstract:Background: Lung adenocarcinoma (LUAD) is a subtype of lung cancer with high morbidity and mortality. While genotyping is an important determinant for the prognosis of LUAD patients, there is a paucity of studies on gene set-based expression (GSE) typing for LUAD. This current study used GSE methodology to perform gene typing of LUAD patients. Methods: Clinical and genomic information of the LUAD patients were downloaded from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases. Patients with LUAD were clustered into different molecular subtypes depending on the clinical and gene set expression characteristics. The survival rate and silhouette widths were compared between each molecular subtype. Differences in survival rate between gene sets were analyzed using Kaplan-Meier survival curves. Cox regression and Lasso regression were used to establish the prognostic gene set model based on the TCGA database, and the results were validated using the GEO dataset. Results: A total of 10 hub genes were finally identified and clustered into 3 subtypes with a mean contour width of 0.96. There were significant differences in survival rates among the 3 subtypes (P<0.05). Gene Ontology (GO) analysis indicated that the related biological processes (BP) were mainly involved in regulation of cell cycle, mitotic cell cycle phase transition, and proteasome-mediated ubiquitin-dependent protein catabolic process. The cellular components (CC) were related to the spindle, chromosomal region, and midbody. Molecular function (MF) mainly focused on ubiquitin-like protein ligase binding, translation regulator activity, and oxidation activity. Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis showed that the main pathways included the Epstein Barr virus infection pathway of neurogeneration, the p53 signaling pathway, and the proteome pathways. In addition, the protein-protein interaction network was analyzed using the STRING and Cytospace software, and the top 9 hub genes identified were KIF2C, DLGAP5, KIF20A, PSMC1, PSMD1, PSMB7, SNAI2, FGF13, and BMP2. Conclusions: Patients with LUAD can be clustered into three subtypes based on the expression of gene sets. These findings contribute to understanding the pathogenesis and molecular mechanisms in LUAD, and may lead to potential individualized pharmacogenetic therapy for patients with LUAD.

Using Gene Ontology-based clustering method to study the genetic heterogeneity of leukemia

Characterizing Heterogeneity in Leukemic Cells Using Single-Cell Gene Expression Analysis

Application of New Clustering Algorithms in Gene Expression Data

A novel clustering analysis based on PC A and SOMs for gene expression patterns

Genetic Algorithms Applied to Multi-Class Clustering for Gene Expression Data

Clustering cancer gene expression data: a comparative study

Clustering gene expression data based on predicted differential effects of GV interaction.

Identifying differentially expressed genes in human acute leukemia and mouse brain microarray datasets utilizing QTModel

Robust structured heterogeneity analysis approach for high-dimensional data

Gene Differential Expression Analysis for Leukemia Based on Relative Risk

Data Clustering Algorithm for DNA Microarray Based on Graph Theory

Heterogeneity Between Primary Colon Carcinoma and Paired Lymphatic and Hepatic Metastases.

Unsupervised Hierarchical Clustering Identifies Immune Gene Subtypes in Gastric Cancer

ClustEx2: Gene Module Identification Using Density-Based Network Hierarchical Clustering

Clustering of Transcriptomic Data for the Identification of Cancer Subtypes

Prospective Identification of Prognostic Hot-Spot Mutant Gene Signatures for Leukemia: A Computational Study Based on Integrative Analysis of TCGA and cBioPortal Data

Molecular clustering based on gene set expression and its relationship with prognosis in patients with lung adenocarcinoma

Analysis of four types of leukemia using Gene Ontology term and Kyoto Encyclopedia of Genes and Genomes pathway enrichment scores.

Identification of two heterogeneous subtypes of hepatocellular carcinoma with distinct pathway activities and clinical outcomes based on gene set variation analysis

Bioinformatics and Raman spectroscopy-based identification of key pathways and genes enabling differentiation between acute myeloid leukemia and T cell acute lymphoblastic leukemia

Tumor Heterogeneity in Gastrointestinal Cancer Based on Multimodal Data Analysis