NeuroFold: A Multimodal Approach to Generating Novel Protein Variants

Keaun Amani,Michael Fish,Matthew D. Smith,Christian Danve M. Castroverde
DOI: https://doi.org/10.1101/2024.03.12.584504
2024-03-14
Abstract:The generation of high-performance enzyme variants with desired physicochemical and functional properties presents a formidable challenge in the field of protein engineering. Existing design methods are limited by inadequate training data, insufficient diversity within datasets, and suboptimal sampling techniques. Here, we introduce a novel approach that addresses these limitations and significantly improves the efficiency of generating functional enzyme variants. Using a multimodal approach, NeuroFold can leverage sequence, structural, and homology data during both sampling and discrimination phases, thereby enabling more diverse and informed sampling of the sequence space. Our model demonstrated a 40-fold increase in Spearman rank correlation as compared to large language models (LLMs) such as ESM-1v and empowers the rapid creation of high-quality enzyme variants, such as the β-lactamase variants generated by NeuroFold in this study, which demonstrated increased thermostability and varying levels of activity. This pipeline represents a promising advancement in the field of enzyme engineering, offering a valuable tool for the development of novel enzymes with enhanced performance and desired chemical properties.
Bioinformatics
What problem does this paper attempt to address?
The paper attempts to address the challenges faced in the field of protein engineering in designing high-performance enzyme variants with desired physicochemical properties and functional characteristics. Specifically, existing computer-aided design methods suffer from insufficient training data, limited dataset diversity, and poor sampling techniques. The paper introduces a new multimodal approach, NeuroFold, which aims to significantly improve the efficiency of generating functional enzyme variants by integrating sequence, structure, and homology data. Experimental results show that NeuroFold excels in generating β-lactamase variants with enhanced thermal stability and different activity levels. Compared to other large language models (such as ESM-1v), its Spearman rank correlation coefficient improved by 40 times. This indicates that NeuroFold has significant potential for advancement in the field of enzyme engineering and provides a valuable tool for developing new enzymes with enhanced performance and desired chemical properties.