Identifying Multi-Functional Enzyme by Hierarchical Multi-Label Classifier

Quan Zou,Weicheng Chen,Yong Huang,Xiangrong Liu,Yi Jiang
DOI: https://doi.org/10.1166/jctn.2013.2804
2013-01-01
Journal of Computational and Theoretical Nanoscience
Abstract:The overwhelming amount of protein databanks, in combination with their explosive growth, has determined the difficulty in annotating enzyme sequences. Automatically identifying enzyme facilitates a more cost effective solution than biological experimental methods. The major challenge of "in silicon" methods is the multi-functional nature of enzyme sequences. To address this problem, a 2-layer predictor has been developed in this paper. The 1st layer prediction engine is to identify a query protein as enzyme or non-enzyme; if it is an enzyme protein, the process will automatically continue to the 2nd layer prediction engine for predicting enzyme function classes. In this layer, multi-functional enzymes which belong to two or more function classes can be detected by the Multi-Label classifier. Two feature extraction methods are adopted in our work. One extracts 20-D features with position-specific scoring matrix (PSSM) and the other extracts 188-D features based on composition and physical chemical property of protein. The first one achieves high accuracy; the latter one is faster while maintaining comparable performance. Experiments proved that our 2-layer predictor outperforms other enzyme identification methods in recognizing enzyme sequences and detecting multifunctional enzymes. Furthermore, more than 1000 unreported multi-functional enzymes are discovered from Swiss-Prot and we find that most of them are alpha/beta structural ones. We also apply our method to the watermelon peptides and find more than 100 multi-functional enzymes. Dataset and software tools for prediction are available at http://datamining.xmu.edu.cn/software/IME.
What problem does this paper attempt to address?