Trans-Driver: a deep learning approach for cancer driver gene discovery with multi-omics data

Hai Yang,Lei Zhang,Dan Zhou,Dongdong Li,Jing Zhang,Zhe Wang
DOI: https://doi.org/10.1101/2022.06.07.495072
2022-01-01
Abstract:Driver genes play a crucial role in the growth of cancer cells. Accurate identification of cancer driver genes is helping to strengthen the understanding of cancer pathogenesis and is conducive to the development of cancer treatment and drug-targe driver genes. However, due to the diversity and complexity of the multi-omics data, it is still challenging to identify cancer drivers.In this study, we propose Trans-Driver, a deep supervised learning method with a novel transformer network, which integrates multi-omics data to learn the differences and associations between different omics data for cancer drivers’ discovery. Compared with other state-of-the-art driver gene identification methods, Trans-Driver has achieved excellent performance on TCGA and CGC data Machine learning for multi-omics data integration in cancer. Among 20,000 protein-coding genes, Trans-Driver reported 185 candidate driver genes, of which 103 genes (about 55%) were included in the gold standard CGC data set. Finally, we analyzed the contribution of each feature to the identification of driver genes. We found that the integration of multi-omics data can improve the performance of our method compared with using only somatic mutation data. Through detailed analysis, we found that the candidate drivers are clinically meaningful, proving the practicability of Trans-Driver. Author summary Many methods have been developed to identify cancer driver genes. However, most of these methods use single-omics data for cancer driver gene identification. Multi-omics-based methods for cancer driver gene identification are rare. Trans-Driver uses deep learning to process multi-omics data and learn the relationships between multi-omics data for cancer driver gene prediction. We have predicted 185 candidate cancer driver genes out of among 20,000 protein-coding genes. Also, we performed cancer driver gene prediction on 33 cancer types, and we identified the cancer driver genes corresponding to each cancer type. And, we observed that the predicted cancer driver genes were shown to have a role in cancer progression in recent studies. Our proposed method for cancer driver gene identification using multi-omics data has improved performance compared to using mutation data alone. ### Competing Interest Statement The authors have declared that no competing interests exist.
What problem does this paper attempt to address?