ScerePhoSite: An interpretable method for identifying fungal phosphorylation sites in proteins using sequence-based features

Chao Wang,Qiang Yang
DOI: https://doi.org/10.1016/j.compbiomed.2023.106798
IF: 7.7
2023-03-23
Computers in Biology and Medicine
Abstract:Protein phosphorylation plays a vital role in signal transduction pathways and diverse cellular processes. To date, a tremendous number of in silico tools have been designed for phosphorylation site identification, but few of them are suitable for the identification of fungal phosphorylation sites. This largely hampers the functional investigation of fungal phosphorylation. In this paper, we present ScerePhoSite, a machine learning method for fungal phosphorylation site identification. The sequence fragments are represented by hybrid physicochemical features, and then LGB-based feature importance combined with the sequential forward search method is used to choose the optimal feature subset. As a result, ScerePhoSite surpasses current available tools and shown a more robust and balanced performance. Furthermore, the impact and contribution of specific features on the model performance were investigated by SHAP values. We expect ScerePhoSite to be a useful bioinformatics tool that complements hands-on experiments for the pre-screening of possible phosphorylation sites and facilitates our functional understanding of phosphorylation modification in fungi. The source code and datasets are accessible at https://github.com/wangchao-malab/ScerePhoSite/ .
engineering, biomedical,computer science, interdisciplinary applications,mathematical & computational biology,biology
What problem does this paper attempt to address?