Topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction

Joshua Zhi En Tan,JunJie Wee,Xue Gong,Kelin Xia
2024-07-12
Abstract:Recently, therapeutic peptides have demonstrated great promise for cancer treatment. To explore powerful anticancer peptides, artificial intelligence (AI)-based approaches have been developed to systematically screen potential candidates. However, the lack of efficient featurization of peptides has become a bottleneck for these machine-learning models. In this paper, we propose a topology-enhanced machine learning model (Top-ML) for anticancer peptide prediction. Our Top-ML employs peptide topological features derived from its sequence "connection" information characterized by vector and spectral descriptors. Our Top-ML model has been validated on two widely used AntiCP 2.0 benchmark datasets and has achieved state-of-the-art performance. Our results highlight the potential of leveraging novel topology-based featurization to accelerate the identification of anticancer peptides.
Quantitative Methods,Machine Learning,General Topology,Biomolecules
What problem does this paper attempt to address?
### Problems the Paper Attempts to Solve This paper aims to address the bottleneck issue in the prediction of Anticancer Peptides (ACPs), specifically how to efficiently characterize peptides to improve the performance of machine learning models. Specifically, the research team proposes a topology-enhanced machine learning model (Top-ML) for predicting peptides with anticancer properties. ### Specific Problems and Methods 1. **Background and Challenges**: - Cancer remains one of the leading causes of death worldwide. Traditional treatment methods (such as radiotherapy and chemotherapy) have significant limitations, such as systemic toxicity and drug resistance. - Anticancer peptides have emerged as a novel alternative for cancer treatment due to their high specificity, low toxicity, and ease of chemical modification. - However, experimental methods for discovering and designing anticancer peptides are costly, time-consuming, and labor-intensive, making large-scale application difficult. 2. **Research Methods**: - **Topological Features**: Utilize Topological Data Analysis (TDA) to extract topological features from peptide sequences, including vector features and spectral features. - **Feature Types**: Combine four types of peptide features: Magnus vector, natural vector, terminal composition features, and spectral features. - **Machine Learning Model**: Train the model by combining these features and using an Extra Trees Classifier, validating the model's performance on benchmark datasets. 3. **Results and Contributions**: - The proposed Top-ML model achieved state-of-the-art performance on two widely used AntiCP 2.0 benchmark datasets. - The results indicate that utilizing topological features can significantly enhance the accuracy of anticancer peptide recognition. Through these methods, the research team demonstrates the potential of mathematically-assisted peptide features in improving the performance of anticancer peptide recognition.