prPred‐DRLF: Plant R protein predictor using deep representation learning features

Yansu Wang,Lei Xu,Quan Zou,Chen Lin
DOI: https://doi.org/10.1002/pmic.202100161
2021-10-14
PROTEOMICS
Abstract:Plant resistance (R) proteins play a significant role in the detection of pathogen invasion. Accurately predicting plant R proteins is a key task in phytopathology. Most plant R protein predictors are dependent on traditional feature extraction methods. Recently, deep representation learning methods have been successfully applied in solving protein classification problems. Motivated by this, we propose a new computational approach, called prPred-DRLF, which uses deep representation learning feature models to encode the amino acids as numerical vectors. The results show that the fused features of bidirectional long short-term memory (BiLSTM) embedding and unified representation (UniRep) embedding have a better performance than other features for plant R protein identification using a light gradient boosting machine (LGBM) classifier. The model was evaluated using an independent test achieving an accuracy of 0.956, F1-score of 0.933 and AUC of 0.997. Meanwhile, compared with the state-of-the-art prPred and HMMER method, prPred-DRLF shows an overall improvement in accuracy, F1-score, AUC and recall. prPred-DRLF is a higher-performance plant R protein prediction tool based on two kinds of deep representation learning technologies and offers a user-friendly interface for inspecting possible plant R proteins. We hope that prPred-DRLF will become a useful tool for biological research. A user-friendly webserver for prPred-DRLF is freely accessible at http://lab.malab.cn/soft/prPred-DRLF. The Python script can be downloaded from https://github.com/Wangys-prog/prPred-DRLF.This article is protected by copyright. All rights reserved
biochemistry & molecular biology,biochemical research methods
What problem does this paper attempt to address?