Data Tells the Truth: A Knowledge Distillation Method for Genomic Survival Analysis by Handling Censoring

Xiu-Shen Wei,He-Yang Xu,Ye Wu,Xiaoming Liu,Ruru Gao,Jiacheng Liu,Bowen Du
DOI: https://doi.org/10.1016/j.fmre.2024.06.016
2024-01-01
Fundamental Research
Abstract:Survival analysis is a critical tool for cancer research, yet handling censored data remains challenging due to supervision bias and inaccurate hazard estimates. To address these issues, we propose a simple but effective method termed KD, which employs knowledge distillation using uncensored data to rectify the supervision bias in censored data. This approach leverages the combined power of both rectified censored data and uncensored data to improve survival prediction accuracy. Remarkably, our KD method not only effectively harnesses censored data but also better reflects clinical reality, demonstrating its immense value in survival analysis. We applied our KD method to 19 target cancer sites using The Cancer Genome Atlas (TCGA) dataset. Our results consistently outperform traditional machine learning and deep learning-based methods across both target cancer sites and independent cancer cohorts. More importantly, our data-driven approach enables the model to extract hidden information from censored data, leading to conclusions that align more closely with clinical knowledge and scenarios. This validation of our KD method’s effectiveness highlights the substantial value of rational censored data usage, providing valuable insights for cancer research and clinical decisions. All data and codes are freely available at: https://datatellstruth.github.io/.
What problem does this paper attempt to address?