Discovery and Construction of Prognostic Model for Clear Cell Renal Cell Carcinoma Based on Single-Cell and Bulk Transcriptome Analysis

Fangyuan Zhang,Shicheng Yu,Pengjie Wu,Liansheng Liu,Dong Wei,Shengwen Li
DOI: https://doi.org/10.21037/tau-21-581
2021-01-01
Translational Andrology and Urology
Abstract:Background Clear cell renal cell carcinoma (ccRCC) is the most common malignant kidney tumor in adults. Single-cell transcriptome sequencing can provide accurate gene expression data of individual cells. Integrated single-cell and bulk transcriptome data from ccRCC samples provide comprehensive information, which allows the discovery of new understandings of ccRCC and the construction of a novel prognostic model for ccRCC patients. Methods Single-cell transcriptome sequencing data was preprocessed by using the Seurat package in R software. Principal component analysis (PCA) and the t-distributed stochastic neighbor embedding (t-SNE) algorithm were used to perform cluster classification. Two subtypes of cancer cells were identified, pseudotime trajectory analysis and gene ontology (GO) analysis were conducted with the monocle and clusterProfiler packages. Two novel cancer cell biomarkers were identified according to the single-cell sequencing and were confirmed by The Cancer Genome Atlas (TCGA) data. T cell-related marker genes according to single-cell sequencing were screened by a combination of Kaplan-Meier (KM) analysis, univariate Cox analysis, least absolute shrinkage and selection operator (Lasso) regression and multivariate Cox analysis of TCGA data. Four survival predicting genes were screened out to develop a risk score model. A nomogram consisting of the risk score and clinical information was constructed to predict the prognosis for ccRCC patients. Results A total of 5,933 cells were included in the study after quality control. Fifteen cell clusters were classified by PCA and t-SNE algorithm. Two clusters of cancer cells with distinct differentiation status were identified. Besides, GO analysis revealed that biological processes were different between the two subgroups. Egl-9 family hypoxia-inducible factor 3 (EGLN3) and nucleolar protein 3 (NOL3) were specifically expressed in cancer cell clusters, bulk RNA sequencing data from TCGA confirmed their high expression in ccRCC tissues. GTSE1, CENPF, SMC2 and H2AFV were screened out and applied to the construction of risk score model. A nomogram was generated to predict prognosis of ccRCC by combing the risk score and clinical parameters. Conclusions We integrated single-cell and bulk transcriptome data from ccRCC in this study. Two subtypes of ccRCC cells with different biological characteristics and two potential biomarkers of ccRCC were discovered. A novel prognostic model was constructed for clinical application.
What problem does this paper attempt to address?