Deep learning survival model on transcriptomes level in patients with non-small cell lung cancer.

Hao Yu,Li Yang,Ka-On Lam,Jian-Yue Jin,Chen Hu,Feng-Ming Spring Kong
DOI: https://doi.org/10.1200/jco.2021.39.15_suppl.e20518
IF: 45.3
2021-05-20
Journal of Clinical Oncology
Abstract:e20518 Background: Non-small cell lung cancer (NSCLC) is associated with poor prognosis. Global gene expression profiling with overall survival (OS) may help improving individualize survival. In this study, we identify biological important gene clusters and studied their prognostic abilities for OS by deep learning method. Methods: Using GEO genomics data repository, we identified 196 NSCLC patients (trainset: GSE37745) and 181 NSCLC patients (testset: GSE50081) with clinical information and long-term follow-up. In both cohorts, expression profiling was performed on RNA from tumor tissues using Affymetrix microarrays HG-U133-Plus2; and normalized using the Robust Multiarray Averaging (RMA). We established deep learning survival models through neural network extension of the Cox regression model for predicting OS, which were developed by 5-folds cross-validation in GSE37745 and independently validated in GSE50081. Significant RNA-seq and clinical variables were multiple inputs. Concordance index (CI) was evaluated and compared with multivariable Cox regression. Then we conducted Uniform Manifold Approximation and Projection (UMAP) using weights in hidden layer of the model for clustering the important RNA-seq and then performed enrichment analysis though GO/KEGG for revealing biological progresses. Results: Total 1039 RNA-seq levels were found significant with OS ( P < 0.05) by Cox proportional hazard model adjusted by clinical variables (age, gender, cancer stage, histology) in trainset. The deep learning survival model with 20 most significant RNA-seq and clinical variables had best average performances as CI = 0.74±0.04 in trainset (GSE37745) and CI = 0.68±0.06 in testset (GSE50081) in 10 iterations, better than multivariable Cox regression ( P < 0.05). The deep learning survival model with all significant RNA-seq were also established and the weights in the hidden layer were clustered by UMAP into 5 positive and 5 negative clusters. The clusters were enriched, such as in positive clusters, negative regulation of RNA metabolic process, negative regulation of RNA biosynthetic process and positive regulation of protein modification process were top three significant biological processes for shorten survival; while in negative clusters, DNA metabolic process, positive regulation of phosphate metabolic process and positive regulation of RNA metabolic process were the top three for prolonged survival. Conclusions: In this study, the deep learning survival algorithm was established for survival prediction based on a transcriptome level in patients with NSCLC. Given the models’ robustness and better performances, our study would be useful at predicting and applying more biological information for survival.
oncology
What problem does this paper attempt to address?