Learning to engineer protein flexibility

Petr Kouba,Joan Planas-Iglesias,Jiri Damborsky,Jiri Sedlar,Stanislav Mazurenko,Josef Sivic
2024-12-24
Abstract:Generative machine learning models are increasingly being used to design novel proteins for therapeutic and biotechnological applications. However, the current methods mostly focus on the design of proteins with a fixed backbone structure, which leads to their limited ability to account for protein flexibility, one of the crucial properties for protein function. Learning to engineer protein flexibility is problematic because the available data are scarce, heterogeneous, and costly to obtain using computational as well as experimental methods. Our contributions to address this problem are three-fold. First, we comprehensively compare methods for quantifying protein flexibility and identify data relevant to learning. Second, we design and train flexibility predictors utilizing sequential or both sequential and structural information on the input. We overcome the data scarcity issue by leveraging a pre-trained protein language model. Third, we introduce a method for fine-tuning a protein inverse folding model to steer it toward desired flexibility in specified regions. We demonstrate that our method Flexpert-Design enables guidance of inverse folding models toward increased flexibility. This opens up new possibilities for protein flexibility engineering and the development of proteins with enhanced biological activities.
Biomolecules
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is the neglect of protein flexibility in current protein design methods. Specifically, most of the existing protein design methods focus on the design of fixed backbone structures, which limits their consideration of protein flexibility, and protein flexibility is one of the key attributes that determine protein function. Therefore, this paper aims to develop a new tool to integrate protein flexibility into computational protein design to overcome the limitations of existing methods. ### Specific description of the problem 1. **Importance of protein flexibility**: - Proteins are highly dynamic biomolecules, and their flexibility is crucial for biological functions. In particular, in the function of enzymes, adjusting the conformational dynamics of loop regions near the active site can significantly affect substrate specificity, turnover rate, and pH - dependence. - The function of many proteins requires small molecules to be transported to the active site through tunnels in their structures, and the dynamic properties of these tunnels are crucial for protein function. 2. **Limitations of existing methods**: - Current methods mainly focus on the design of fixed backbone structures and cannot fully consider protein flexibility. - Experimental methods (such as X - ray crystallography, nuclear magnetic resonance, hydrogen - deuterium exchange - coupled mass spectrometry) are accurate but costly, time - consuming, and lack high - throughput. - Computational methods (such as coarse - grained modeling, molecular dynamics simulations) offer a wide range of options, but there is a lack of systematic comparison on large - scale data sets, and it is difficult to effectively integrate with the latest generative models. 3. **Data scarcity**: - The available protein flexibility data are scarce, heterogeneous, and costly to obtain, whether through computational or experimental methods. ### Solution To address these problems, the authors make three main contributions: 1. **Comprehensive comparison of methods for quantifying protein flexibility**: - Systematically evaluate the performance of different methods (such as molecular dynamics simulations, B - factor, AlphaFold2, ESMFold, GNM, ANM, etc.) in quantifying protein flexibility and identify relevant data that can be used for learning. 2. **Design and training of flexibility predictors**: - Develop two flexibility predictors: Flexpert - Seq (sequence - only) and Flexpert - 3D (combining sequence and structural information). Overcome the problem of data scarcity by leveraging pre - trained protein language models (such as ProtTrans). 3. **Introduction of the Flexpert - Design framework**: - Propose a new method to fine - tune protein inverse - folding models (such as ProteinMPNN) so that they can generate protein sequences with the desired flexibility according to specified flexibility instructions. ### Conclusion Through these methods, the authors demonstrate the possibility of predicting flexibility from sequences and further prove that the prediction accuracy can be improved by incorporating structural information. In addition, they also show how to guide inverse - folding models to generate protein sequences with enhanced flexibility, opening up new possibilities for protein flexibility engineering.