Anti-Cancer Peptides Identification and Activity Type Classification with Protein Sequence Pre-training

Shaokai Wang,Bin Ma
DOI: https://doi.org/10.1109/jbhi.2024.3358632
IF: 7.7
2024-01-01
IEEE Journal of Biomedical and Health Informatics
Abstract:Cancer remains a significant global health challenge, responsible for millions of deaths annually. Addressing this issue necessitates the discovery of novel anti-cancer drugs. Anti-cancer peptides (ACPs), with their unique ability to selectively target cancer cells, offer new hope in discovering low side-effect anti-cancer drugs. However, the process of discovering novel ACPs is both time-consuming and costly. Therefore, there is an urgent need for a computational method that can predict whether a given peptide is an ACP and classify its specific functional types. In this paper, we introduce DUO-ACP, a model serving dual roles in ACP prediction: identification and functional type classification. DUO-ACP employs two embedding modules to acquire knowledge about global protein features and local ACP characteristics, complemented by a prediction module. When assessed on two publicly available datasets for each task, DUO-ACP surpasses all existing methods, achieving outstanding results: an ACP identification accuracy of 89.5% and a Macro-averaged AUC of 88.6% in ACP functional type classification. We further interpret the contribution of each part of our model, including the two types of embeddings as well as ensemble learning. On a new curated dataset, the prediction results of DUO-ACP closely match existing literature, highlighting DUO-ACP's generalization capabilities on previously unseen data and displaying the potential capability of discovering novel ACP. The source code of DUO-ACP is publicly available on GitHub (https://github.com/waterlooms/DUO-ACP)
computer science, interdisciplinary applications,mathematical & computational biology,medical informatics, information systems
What problem does this paper attempt to address?