VTP-Identifier: Vesicular Transport Proteins Identification Based on PSSM Profiles and XGBoost

Yue Gong,Benzhi Dong,Zixiao Zhang,Yixiao Zhai,Bo Gao,Tianjiao Zhang,Jingyu Zhang
DOI: https://doi.org/10.3389/fgene.2021.808856
IF: 3.7
2022-01-01
Frontiers in Genetics
Abstract:Vesicular transport proteins are related to many human diseases, and they threaten human health when they undergo pathological changes. Protein function prediction has been one of the most in-depth topics in bioinformatics. In this work, we developed a useful tool to identify vesicular transport proteins. Our strategy is to extract transition probability composition, autocovariance transformation and other information from the position-specific scoring matrix as feature vectors. EditedNearesNeighbours (ENN) is used to address the imbalance of the data set, and the Max-Relevance-Max-Distance (MRMD) algorithm is adopted to reduce the dimension of the feature vector. We used 5-fold cross-validation and independent test sets to evaluate our model. On the test set, VTP-Identifier presented a higher performance compared with GRU. The accuracy, Matthew's correlation coefficient (MCC) and area under the ROC curve (AUC) were 83.6%, 0.531 and 0.873, respectively.
What problem does this paper attempt to address?