A Transformer-Based Approach to Survival Outcome Prediction

Ted Mellors,Matt Schneider
DOI: https://doi.org/10.1101/2024.11.03.621674
2024-11-03
Abstract:Accurate prediction of patient survival outcomes is a critical challenge in cancer research, with the potential to inform personalized treatment strategies and improve patient care. We leveraged Geneformer, a state-of-the-art transformer model pre-trained on a massive single-cell RNA-seq dataset, to develop a model for the prediction of overall survival (OS). We adapted Geneformer for bulk tumor data analysis by appending a task-specific transformer layer and fine-tuning the model on RNA-seq data from The Cancer Genome Atlas (TCGA). Additionally, we employed a rank-value encoding scheme to prioritize informative genes and reduce noise. Our model demonstrated a robust correlation between predicted and true OS, with Pearson correlation coefficient of 0.72 (p<0.00001). Survival analysis revealed significant differences in survival between patient subgroups stratified based on the model's predictions. The Geneformer-based model outperformed traditional machine learning approaches (Random Forest and Neural Network) in patient stratification tasks. Further analysis demonstrated the consistency of the model's performance across different tumor stages and patient subgroups. Our study highlights the potential of leveraging pre-trained transformer models, originally developed for single-cell data analysis, to predict clinically relevant outcomes from bulk tumor gene expression data. The superior performance of our Geneformer-based model underscores its potential to enhance prognostication and treatment decision-making in cancer research. Future work will focus on refining the model architecture, incorporating multi-omics data, and validating its performance on external datasets to further advance its clinical utility.
Bioinformatics
What problem does this paper attempt to address?