ECD-CDGI: An efficient energy-constrained diffusion model for cancer driver gene identification
Tao Wang,Linlin Zhuo,Yifan Chen,Xiangzheng Fu,Xiangxiang Zeng,Quan Zou
DOI: https://doi.org/10.1371/journal.pcbi.1012400
2024-09-03
PLoS Computational Biology
Abstract:The identification of cancer driver genes (CDGs) poses challenges due to the intricate interdependencies among genes and the influence of measurement errors and noise. We propose a novel energy-constrained diffusion (ECD)-based model for identifying CDGs, termed ECD-CDGI. This model is the first to design an ECD-Attention encoder by combining the ECD technique with an attention mechanism. ECD-Attention encoder excels at generating robust gene representations that reveal the complex interdependencies among genes while reducing the impact of data noise. We concatenate topological embedding extracted from gene-gene networks through graph transformers to these gene representations. We conduct extensive experiments across three testing scenarios. Extensive experiments show that the ECD-CDGI model possesses the ability to not only be proficient in identifying known CDGs but also efficiently uncover unknown potential CDGs. Furthermore, compared to the GNN-based approach, the ECD-CDGI model exhibits fewer constraints by existing gene-gene networks, thereby enhancing its capability to identify CDGs. Additionally, ECD-CDGI is open-source and freely available. We have also launched the model as a complimentary online tool specifically crafted to expedite research efforts focused on CDGs identification. Cancer has become a major disease threatening human life and health. Cancer usually originates from abnormal gene activities, such as mutations and copy number variations. Mutations in cancer driver genes are crucial for the selective growth of tumor cells. Identifying cancer driver genes is crucial in cancer-related research and treatment strategies, as it helps understand cancer occurrence and development. However, the complex gene-gene interactions, measurement errors, and the prevalence of unlabeled data significantly complicate the identification of these driver genes. We developed a new method that integrates an energy-constrained diffusion mechanism with an attention mechanism to uncover implicit gene dependencies in biomolecular networks and generate robust gene representations. Extensive experiments demonstrated that our model accurately identifies known cancer driver genes and effectively discovers potential ones. Furthermore, we analyzed and predicted patient-specific mutated genes, enhancing our understanding of their pathogenesis and advancing precision medicine. In summary, our method offers a promising tool for advancing the identification of cancer driver genes.
biochemical research methods,mathematical & computational biology