Top-down machine learning of coarse-grained protein force-fields

Carles Navarro,Maciej Majewski,Gianni de Fabritiis
2023-10-10
Abstract:Developing accurate and efficient coarse-grained representations of proteins is crucial for understanding their folding, function, and interactions over extended timescales. Our methodology involves simulating proteins with molecular dynamics and utilizing the resulting trajectories to train a neural network potential through differentiable trajectory reweighting. Remarkably, this method requires only the native conformation of proteins, eliminating the need for labeled data derived from extensive simulations or memory-intensive end-to-end differentiable simulations. Once trained, the model can be employed to run parallel molecular dynamics simulations and sample folding events for proteins both within and beyond the training distribution, showcasing its extrapolation capabilities. By applying Markov State Models, native-like conformations of the simulated proteins can be predicted from the coarse-grained simulations. Owing to its theoretical transferability and ability to use solely experimental static structures as training data, we anticipate that this approach will prove advantageous for developing new protein force fields and further advancing the study of protein dynamics, folding, and interactions.
Biomolecules,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the understanding of protein folding, function, and interactions on long time scales. Specifically, the researchers have developed a method based on Neural Network Potential (NNP) to generate coarse-grained representations of proteins. The main features of this method are as follows: 1. **No need for labeled data**: This method only requires the native conformations of proteins as training data, without relying on a large amount of simulated or time-consuming end-to-end differentiable simulation-generated labeled data. 2. **Efficiency**: By using Differentiable Trajectory Reweighting (DiffTre), this method can train the neural network without saving all simulation operations, thereby significantly reducing memory consumption. 3. **Extrapolation capability**: Once trained, the model can sample protein folding events outside the training distribution and demonstrate its extrapolation capability. 4. **Transferability**: This method theoretically has transferability, allowing the use of experimental static structures as training data. It is expected that this method will help develop new protein force fields and further advance the study of protein dynamics, folding, and interactions. In summary, the goal of the paper is to develop an efficient machine learning method to generate accurate and efficient coarse-grained models of proteins, thereby better understanding and predicting protein behavior.