Abstract:Abstract: Although enzymes have the advantage of efficient catalysis, natural enzymes lack stability in industrial environments and do not even meet the required catalytic reactions. This prompted us to urgently de novo design new enzymes. Computational design is a powerful tool, allowing rapid and efficient exploration of sequence space and facilitating the design of novel enzymes tailored to specific conditions and requirements. It is beneficial to de novo design industrial enzymes using computational methods. Currently, only one tool explicitly designed for the enzyme-only generation performs unsatisfactorily. We have selected several general protein sequence design tools and systematically evaluated their effectiveness when applied to specific industrial enzymes. We investigated the literature related to protein generation. We summarized the computational methods used for sequence generation into three categories: structure-conditional sequence generation, sequence generation without structural constraints, and co-generation of sequence and structure. To effectively evaluate the ability of six computational tools to generate enzyme sequences, we first constructed a luciferase dataset named Luc_64. Then we assessed the quality of enzyme sequences generated by these methods on this dataset, including amino acid distribution, EC number validation, etc. We also assessed sequences generated by structure-based methods on existing public datasets using sequence recovery rates and root-mean-square deviation (RMSD) from a sequence and structure perspective. In the functionality dataset, Luc_64, ABACUS-R, and ProteinMPNN stood out for producing sequences with amino acid distributions and functionalities closely matching those of naturally occurring luciferase enzymes, suggesting their effectiveness in preserving essential enzymatic characteristics. Across both benchmark datasets, ABACUS-R and ProteinMPNN, have also exhibited the highest sequence recovery rates, indicating their superior ability to generate sequences closely resembling the original enzyme structures. Our study provides a crucial reference for researchers selecting appropriate enzyme sequence design tools, highlighting the strengths and limitations of each tool in generating accurate and functional enzyme sequences. ProteinMPNN and ABACUS-R emerged as the most effective tools in our evaluation, offering high accuracy in sequence recovery and RMSD and maintaining the functional integrity of enzymes through accurate amino acid distribution. Meanwhile, the performance of protein general tools for migration to specific industrial enzymes was fairly evaluated on our specific industrial enzyme benchmark.

Computational scoring and experimental evaluation of enzymes generated by neural networks

Assessing the laboratory performance of AI-generated enzymes

Generative Enzyme Design Guided by Functionally Important Sites and Small-Molecule Substrates

Towards Robust Evaluation of Protein Generative Models: A Systematic Analysis of Metrics

Comparative Analysis of Deep Generative Model for Industrial Enzyme Design

Exploration and Evaluation of Machine Learning-Based Models for Predicting Enzymatic Reactions

NeuroFold: A Multimodal Approach to Generating Novel Protein Variants

Enzyme Activity Prediction of Sequence Variants on Novel Substrates using Improved Substrate Encodings and Convolutional Pooling

Harnessing generative AI to decode enzyme catalysis and evolution for enhanced engineering

Evolutionary context-integrated deep sequence modeling for protein engineering

EnzymeNet: Residual Neural Networks model for Enzyme Commission number prediction

Expanding functional protein sequence spaces using generative adversarial networks

Predicting and Interpreting Protein Developability Via Transfer of Convolutional Sequence Representation

DeepEnzyme: a robust deep learning model for improved enzyme turnover number prediction by utilizing features of protein 3D-structures

[Progress in the application of artificial intelligence-assisted molecular modification of enzymes]

Advances in generative modeling methods and datasets to design novel enzymes for renewable chemicals and fuels

PDBench: Evaluating Computational Methods for Protein Sequence Design

Conditional generative modeling for de novo protein design with hierarchical functions

COMPUTATIONAL ENZYME DESIGN APPROACHES WITH SIGNIFICANT BIOLOGICAL OUTCOMES: PROGRESS AND CHALLENGES

Generative models for protein sequence modeling: recent advances and future directions

Accurate computational evolution of proteins and its dependence on deep learning