Jason Yang,Francesca-Zhoufan Li,Frances H. Arnold
Abstract:Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.
What problem does this paper attempt to address?
This paper discusses the applications and challenges of machine learning in enzyme engineering. The main goal of enzyme engineering is to optimize the properties of enzymes, such as expression, stability, substrate range, and catalytic efficiency, by modifying the amino acid sequence, and even creating new catalytic activities not found in nature. However, traditional methods such as directed evolution are inefficient in finding the best enzymes due to the enormous protein sequence space.
Machine learning (ML) has two main applications in this field. First, it can be used to discover starting points by annotating the functions of known protein sequences or generating new protein sequences with specific functions. Second, it can optimize the performance of proteins by learning the relationship between protein sequences and their properties, and navigating the protein fitness landscape. ML models can help in quickly identifying enzymes with desired activity and improving their adaptability.
In the process of discovering functional enzymes, ML can be used to classify existing protein databases, identify unannotated enzymatic activities, or design new proteins using deep learning. Additionally, AI can simulate the roles of structural biologists and organic chemists to predict the feasibility of specific reactions.
For navigating the protein fitness landscape, ML models can predict the adaptability of protein variants, thereby expanding the screening range and overcoming the limitations of directed evolution, such as local optima and restrictions on considering single mutations. However, current ML methods still face many challenges, such as dealing with non-additive effects of multiple mutations (epistasis), constructing more comprehensive protein fitness landscape models, and optimizing highly adaptable protein variants.
In conclusion, this paper explores how machine learning can improve enzyme engineering by discovering new enzymes and optimizing enzyme performance. It also points out key issues and potential strategies that future research needs to address.