Integrating multi-omics data through deep learning for accurate cancer prognosis prediction
Hua Chai,Xiang Zhou,Zhongyue Zhang,Jiahua Rao,Huiying Zhao,Yuedong Yang
DOI: https://doi.org/10.1016/j.compbiomed.2021.104481
IF: 7.7
2021-07-01
Computers in Biology and Medicine
Abstract:<h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Background</h3><p>Genomic information is nowadays widely used for precise cancer treatments. Since the individual type of omics data only represents a single view that suffers from data noise and bias, multiple types of omics data are required for accurate cancer prognosis prediction. However, it is challenging to effectively integrate multi-omics data due to the large number of redundant variables but relatively small sample size. With the recent progress in deep learning techniques, Autoencoder was used to integrate multi-omics data for extracting representative features. Nevertheless, the generated model is fragile from data noises. Additionally, previous studies usually focused on individual cancer types without making comprehensive tests on pan-cancer. Here, we employed the denoising Autoencoder to get a robust representation of the multi-omics data, and then used the learned representative features to estimate patients' risks.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Results</h3><p>By applying to 15 cancers from The Cancer Genome Atlas (TCGA), our method was shown to improve the C-index values over previous methods by 6.5% on average. Considering the difficulty to obtain multi-omics data in practice, we further used only mRNA data to fit the estimated risks by training XGboost models, and found the models could achieve an average C-index value of 0.627. As a case study, the breast cancer prognosis prediction model was independently tested on three datasets from the Gene Expression Omnibus (GEO), and shown able to significantly separate high-risk patients from low-risk ones (C-index>0.6, p-values<0.05). Based on the risk subgroups divided by our method, we identified nine prognostic markers highly associated with breast cancer, among which seven genes have been proved by literature review.</p><h3 class="u-h4 u-margin-m-top u-margin-xs-bottom">Conclusion</h3><p>Our comprehensive tests indicated that we have constructed an accurate and robust framework to integrate multi-omics data for cancer prognosis prediction. Moreover, it is an effective way to discover cancer prognosis-related genes.</p>
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology