A New Method for Predicting Plant Proteins Function Based on Multi Label Classification Algorithm

Shuai Chen,Juan Wang,Maozu Guo
DOI: https://doi.org/10.1109/bibm55620.2022.9994961
2022-01-01
Abstract:Protein function annotation is an important content of bioinformatics. It is unrealistic to experimentally annotate a large number of protein functions, so automated prediction of the functions of proteins is required. Studying plant proteins can help us cultivate new plant varieties. However, the existing methods mainly aim at predicting the single function of plant proteins. Most proteins only have sequence information. Therefore, the multi-function prediction method of plant protein based on sequence has become the focus of research. In this study, we propose a method based on sequence to predict the function of plant proteins, called PlantGO. We describe the functional annotation problem as a multi-label classification problem using Gene Ontology (GO) terminology, and predict multiple functions for plant proteins. PlantGO extracts features from three aspects and performs feature fusion respectively. To avoid redundant information in features, PlantGO uses the random forest to select features and obtains three groups of optimal features. PlantGO uses a multi-label learning algorithm MLKNN to predict protein function, which proves the effectiveness and accuracy of the algorithm. Finally, PlantGO adopts the ensemble learning method to integrate all models into a unified model, further improving the prediction performance. Therefore, PlantGO has achieved excellent performance. The performance metrics are Accuracy (0.936), F1 Score (0.944), Ranking loss (0.011), which is better than the previous method in the independent test. We have developed an online server to predict the functions of plant proteins.
What problem does this paper attempt to address?