Autosurv: interpretable deep learning framework for cancer survival analysis incorporating clinical and multi-omics data

Lindong Jiang,Chao Xu,Yuntong Bai,Anqi Liu,Yun Gong,Yu-Ping Wang,Hong-Wen Deng
DOI: https://doi.org/10.1038/s41698-023-00494-6
2024-01-05
npj Precision Oncology
Abstract:Abstract Accurate prognosis for cancer patients can provide critical information for optimizing treatment plans and improving life quality. Combining omics data and demographic/clinical information can offer a more comprehensive view of cancer prognosis than using omics or clinical data alone and can also reveal the underlying disease mechanisms at the molecular level. In this study, we developed and validated a deep learning framework to extract information from high-dimensional gene expression and miRNA expression data and conduct prognosis prediction for breast cancer and ovarian-cancer patients using multiple independent multi-omics datasets. Our model achieved significantly better prognosis prediction than the current machine learning and deep learning approaches in various settings. Moreover, an interpretation method was applied to tackle the “black-box” nature of deep neural networks and we identified features (i.e., genes, miRNA, demographic/clinical variables) that were important to distinguish predicted high- and low-risk patients. The significance of the identified features was partially supported by previous studies.
oncology
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to solve the problem of the accuracy of cancer patient prognosis prediction. Specifically, the author developed and validated a deep - learning framework - **AUTOSurv**, for integrating clinical information and multi - omics data (such as gene expression and miRNA expression data) to improve the prognosis prediction accuracy for breast cancer and ovarian cancer patients. #### Main problems 1. **Improving the accuracy of prognosis prediction**: - Accurate cancer prognosis prediction is crucial for optimizing treatment plans and improving the quality of life. - Using omics data or clinical data alone for prognosis prediction has limitations, and combining the two can provide a more comprehensive perspective and reveal disease mechanisms at the molecular level. 2. **Dealing with the problem of high - dimensional and low - sample - size**: - Omics data are usually high - dimensional, but the sample size is small, which is likely to lead to over - fitting and affect the generalization ability of the model. - AUTOSurv effectively solves this problem by designing a special variational auto - encoder (VAE) for dimension reduction. 3. **Interpreting the "black - box" characteristics of deep neural networks**: - Deep neural networks are usually considered "black - boxes", and it is difficult to explain their decision - making processes. - The author applied the DeepSHAP interpretation method to identify features (such as genes, miRNAs and clinical variables) that are important for distinguishing high - risk and low - risk patients, thereby improving the interpretability of the model. #### Solutions - **Framework design**: The AUTOSurv framework is divided into two steps: 1. **KL - PMVAE**: A path - information - guided variational auto - encoder that extracts low - dimensional latent features from high - dimensional gene expression and miRNA expression data. 2. **LFSurv**: A multi - layer perceptron network that combines latent features with demographic/clinical variables to calculate the prognosis index (PI) for each patient. The higher the PI, the higher the risk of death. - **Performance evaluation**: Verified by multiple independent multi - omics datasets, AUTOSurv significantly outperforms existing machine - learning and deep - learning methods in various settings. - **Feature interpretation**: Use the DeepSHAP method to interpret the model and identify genes, miRNAs and pathways that contribute significantly to prognosis prediction. ### Summary By developing the AUTOSurv framework, this paper aims to improve the accuracy of cancer patient prognosis prediction, solve the problem of high - dimensional and low - sample - size at the same time, and enhance the interpretability of the model through interpretation methods. These improvements are helpful for better understanding the hidden mechanisms of cancer progression and providing strong support for clinical decision - making.