UsIL-6: An unbalanced learning strategy for identifying IL-6 inducing peptides by undersampling technique

Yan-hong Liao,Shou-zhi Chen,Yan-nan Bin,Jian-ping Zhao,Xin-long Feng,Chun-hou Zheng
DOI: https://doi.org/10.1016/j.cmpb.2024.108176
IF: 6.1
2024-04-29
Computer Methods and Programs in Biomedicine
Abstract:Background and objective Interleukin-6 (IL-6) is the critical factor of early warning, monitoring, and prognosis in the inflammatory storm of COVID-19 cases. IL-6 inducing peptides, which can induce cytokine IL-6 production, are very important for the development of diagnosis and immunotherapy. Although the existing methods have some success in predicting IL-6 inducing peptides, there is still room for improvement in the performance of these models in practical application. Methods In this study, we proposed UsIL-6, a high-performance bioinformatics tool for identifying IL-6 inducing peptides. First, we extracted five groups of physicochemical properties and sequence structural information from IL-6 inducing peptide sequences, and obtained a 636-dimensional feature vector, we also employed NearMiss3 undersampling method and normalization method StandardScaler to process the data. Then, a 40-dimensional optimal feature vector was obtained by Boruta feature selection method. Finally, we combined this feature vector with extreme randomization tree classifier to build the final model UsIL-6. Results The AUC value of UsIL-6 on the independent test dataset was 0.87, and the BACC value was 0.808, which indicated that UsIL-6 had better performance than the existing methods in IL-6 inducing peptide recognition. Conclusions The performance comparison on independent test dataset confirmed that UsIL-6 could achieve the highest performance, best robustness, and most excellent generalization ability. We hope that UsIL-6 will become a valuable method to identify, annotate and characterize new IL-6 inducing peptides.
engineering, biomedical,computer science, interdisciplinary applications,medical informatics, theory & methods
What problem does this paper attempt to address?