Exploration and Evaluation of Machine Learning-Based Models for Predicting Enzymatic Reactions
Naoki Watanabe,Masahiro Murata,Teppei Ogawa,Christopher J. Vavricka,Akihiko Kondo,Chiaki Ogino,Michihiro Araki
DOI: https://doi.org/10.1021/acs.jcim.9b00877
IF: 6.162
2020-02-13
Journal of Chemical Information and Modeling
Abstract:Unannotated gene sequences in databases are increasing due to sequencing advances. Therefore, computational methods to predict functions of unannotated genes are needed. Moreover, novel enzyme discovery for metabolic engineering applications further encourages annotation of sequences. Here, enzyme functions are predicted using two general approaches, each including several machine learning algorithms. First, Enzyme-models (E-models) predict Enzyme Commission (EC) numbers from amino acid sequence information. Second, Substrate-Enzyme models (SE-models) are built to predict substrates of enzymatic reactions together with EC numbers, and Substrate-Enzyme-Product models (SEP-models) are built to predict substrates, products, and EC numbers. While accuracy of E-models is not optimal, SE-models and SEP-models predict EC numbers and reactions with high accuracy using all tested machine learning-based methods. For example, a single Random Forests-based SEP-model predicts EC first digits with an Average AUC score of over 0.94. Various metrics indicate that the current strategy of combining sequence and chemical structure information is effective at improving enzyme reaction prediction.
chemistry, multidisciplinary, medicinal,computer science, interdisciplinary applications, information systems