Prediction of inhibitory peptides against E. coli with desired MIC value

Nisha Bajiya,Nishant Kumar,Gajendra P.S. Raghava
DOI: https://doi.org/10.1101/2024.07.18.604028
2024-07-22
Abstract:In the past, several methods have been developed for predicting antibacterial and antimicrobial peptides, but only limited attempts have been made to predict their minimum inhibitory concentration (MIC) values. In this study, we trained our models on 3,143 peptides and validated them on 786 peptides whose MIC values have been determined experimentally against Escherichia coli (E. coli). The correlational analysis reveals that the Composition Enhanced Transition and Distribution (CeTD) attributes strongly correlate with MIC values. We initially employed the similarity search strategy utilizing BLAST to estimate MIC values of peptides but found it inadequate for prediction. Next, we developed machine learning techniques-based regression models using a wide range of features, including peptide composition, binary profile, and embeddings of large language models. We implemented feature selection techniques like minimum Redundancy Maximum Relevance (mRMR) to select the best relevant features for developing prediction models. Our Random forest-based regressor, based on selected features, achieved a correlation coefficient (R) of 0.78, R-squared (R2) of 0.59, and a root mean squared error (RMSE) of 0.53 on the validation dataset. Our best model outperforms the existing methods when benchmarked on an independent dataset of 498 inhibitory peptides of E. coli. One of the major features of the web-based platform EIPpred developed in this study is that it allows users to identify or design peptides that can inhibit E. coli with the desired MIC value (https://webs.iiitd.edu.in/raghava/eippred).
Bioinformatics
What problem does this paper attempt to address?