Abstract:The identification of anticancer peptides (ACPs) is crucial, especially in the development of peptide-based cancer therapy. The classical models such as Split Amino Acid Composition (SAAC) and Pseudo Amino Acid Composition (PseAAC) lack the incorporation of feature representation. These advancements improve the predictive accuracy and efficiency of ACP identification. Thus, the effort of this research is to propose and develop an advanced framework based on feature extraction. Thus, to achieve this objective herein we propose an Extended Dipeptide Composition (EDPC) framework. The proposed EDPC framework extends the dipeptide composition by considering the local sequence environment information and reforming the CD-HIT framework to remove noise and redundancy. To measure the accuracy, we have performed several experiments. These experiments were employed using four famous machine learning (ML) algorithms named; Support Vector Machine (SVM), Decision Tree (DT), Random Forest (RF), and K Nearest Neighbor (KNN). For comparisons, we have used accuracy, specificity, sensitivity, precision, recall, and F1-Score as evaluation criteria. The reliability of the proposed framework is further evaluated using statistical significance tests. As a result, the proposed EDPC framework exhibited enhanced performance than SAAC and PseAAC, where the SVM model delivered the highest accuracy of 96. 6% and significant enhancements in specificity, sensitivity, precision, and F1-score over multiple datasets. Due to the incorporation of enhanced feature representation and the incorporation of local and global sequence profiles proposed EDPC achieves higher classification performance. The proposed frameworks can deal with noise and also duplicating features. These are accompanied by a wide range of feature representations. Finally, our proposed framework can be used for clinical applications where ACP identification is essential. Future works will include extending to a larger variety of datasets, incorporating tertiary structural information, and using deep learning techniques to improve the proposed EDPC.

StackDPPred: Multiclass Prediction of Defensin Peptides using Stacked Ensemble Learning with Optimized Features

A novel stacking-based predictor for accurate prediction of antimicrobial peptides

StackCPPred: a stacking and pairwise energy content-based prediction of cell-penetrating peptides and their uptake efficiency

Ensemble Machine Learning and Predicted Properties Promote Antimicrobial Peptide Identification

StackedEnC-AOP: prediction of antioxidant proteins using transform evolutionary and sequential features based multi-scale vector with stacked ensemble learning

StackDPP: a stacking ensemble based DNA-binding protein prediction model

PLMACPred prediction of anticancer peptides based on protein language model and wavelet denoising transformation

Antimicrobial Peptide Prediction Using Ensemble Learning Algorithm

Prediction of Plant Resistance Proteins Based on Pairwise Energy Content and Stacking Framework

Extended dipeptide composition framework for accurate identification of anticancer peptides

AMPpred-EL: An effective antimicrobial peptide prediction model based on ensemble learning

SAMP: Identifying Antimicrobial Peptides by an Ensemble Learning Model Based on Proportionalized Split Amino Acid Composition

PredAPP: Predicting Anti-Parasitic Peptides with Undersampling and Ensemble Approaches

PAMPred: A hierarchical evolutionary ensemble framework for identifying plant antimicrobial peptides

AMPActiPred: A three‐stage framework for predicting antibacterial peptides and activity levels with deep forest

PREDAIP: Computational Prediction and Analysis for Anti-inflammatory Peptide Via a Hybrid Feature Selection Technique

AmPEP: Sequence-based prediction of antimicrobial peptides using distribution patterns of amino acid properties and random forest

Protein–Protein Interactions Prediction via Multimodal Deep Polynomial Network and Regularized Extreme Learning Machine

DMAMP: A deep-learning model for detecting antimicrobial peptides and their multi-activities

Prediction of Protein Structural Class Based on Gapped-Dipeptides and a Recursive Feature Selection Approach

mACPpred 2.0: Stacked Deep Learning for Anticancer Peptide Prediction with Integrated Spatial and Probabilistic Feature Representations