Abstract:Enzymes can be engineered at the level of their amino acid sequences to optimize key properties such as expression, stability, substrate range, and catalytic efficiency-or even to unlock new catalytic activities not found in nature. Because the search space of possible proteins is vast, enzyme engineering usually involves discovering an enzyme starting point that has some level of the desired activity followed by directed evolution to improve its "fitness" for a desired application. Recently, machine learning (ML) has emerged as a powerful tool to complement this empirical process. ML models can contribute to (1) starting point discovery by functional annotation of known protein sequences or generating novel protein sequences with desired functions and (2) navigating protein fitness landscapes for fitness optimization by learning mappings between protein sequences and their associated fitness values. In this Outlook, we explain how ML complements enzyme engineering and discuss its future potential to unlock improved engineering outcomes.

What problem does this paper attempt to address?

This paper discusses the applications and challenges of machine learning in enzyme engineering. The main goal of enzyme engineering is to optimize the properties of enzymes, such as expression, stability, substrate range, and catalytic efficiency, by modifying the amino acid sequence, and even creating new catalytic activities not found in nature. However, traditional methods such as directed evolution are inefficient in finding the best enzymes due to the enormous protein sequence space. Machine learning (ML) has two main applications in this field. First, it can be used to discover starting points by annotating the functions of known protein sequences or generating new protein sequences with specific functions. Second, it can optimize the performance of proteins by learning the relationship between protein sequences and their properties, and navigating the protein fitness landscape. ML models can help in quickly identifying enzymes with desired activity and improving their adaptability. In the process of discovering functional enzymes, ML can be used to classify existing protein databases, identify unannotated enzymatic activities, or design new proteins using deep learning. Additionally, AI can simulate the roles of structural biologists and organic chemists to predict the feasibility of specific reactions. For navigating the protein fitness landscape, ML models can predict the adaptability of protein variants, thereby expanding the screening range and overcoming the limitations of directed evolution, such as local optima and restrictions on considering single mutations. However, current ML methods still face many challenges, such as dealing with non-additive effects of multiple mutations (epistasis), constructing more comprehensive protein fitness landscape models, and optimizing highly adaptable protein variants. In conclusion, this paper explores how machine learning can improve enzyme engineering by discovering new enzymes and optimizing enzyme performance. It also points out key issues and potential strategies that future research needs to address.

Opportunities and Challenges for Machine Learning-Assisted Enzyme Engineering

Machine learning-guided directed evolution for protein engineering

Machine Learning-Guided Protein Engineering

Machine-learning-guided directed evolution for protein engineering

Data‐Driven Protein Engineering for Improving Catalytic Activity and Selectivity

Accelerated enzyme engineering by machine-learning guided cell-free expression

Enhanced Sequence-Activity Mapping and Evolution of Artificial Metalloenzymes by Active Learning

On synergy between ultrahigh throughput screening and machine learning in biocatalyst engineering

Navigating the landscape of enzyme design: from molecular simulations to machine learning

Machine Learning for Protein Engineering

Machine Learning in Nanozymes: From Design to Application

Machine learning facilitating the rational design of nanozymes.

Adaptive machine learning for protein engineering

Evaluation of Machine Learning-Assisted Directed Evolution Across Diverse Combinatorial Landscapes

Machine learning-guided co-optimization of fitness and diversity facilitates combinatorial library design in enzyme engineering

Accelerating Biocatalysis Discovery with Machine Learning: A Paradigm Shift in Enzyme Engineering, Discovery, and Design

Knowledge-aware Reinforced Language Models for Protein Directed Evolution

Harnessing generative AI to decode enzyme catalysis and evolution for enhanced engineering

Integrating Genetic Algorithms and Language Models for Enhanced Enzyme Design

Engineering of highly active and diverse nuclease enzymes by combining machine learning and ultra-high-throughput screening