Deep learning of protein energy landscape and conformational dynamics from experimental structures in PDB

Yike Tang,Mendi Yu,Ganggang Bai,Xinjun Li,Yanyan Xu,Buyong Ma
DOI: https://doi.org/10.1101/2024.06.27.600251
2024-06-27
Abstract:Protein structure prediction has reached revolutionary levels of accuracy on single structures, implying biophysical energy function can be learned from known protein structures. However apart from single static structure, conformational distributions and dynamics often control protein biological functions. In this work, we tested a hypothesis that protein energy landscape and conformational dynamics can be learned from experimental structures in PDB and coevolution data. Towards this goal, we develop DeepConformer, a diffusion generative model for sampling protein conformation distributions from a given amino acid sequence. Despite the lack of molecular dynamics (MD) simulation data in training process, DeepConformer captured conformational flexibility and dynamics (RMSF and covariance matrix correlation) similar to MD simulation and reproduced experimentally observed conformational variations. Our study demonstrated that DeepConformer learned energy landscape can be used to efficiently explore protein conformational distribution and dynamics.
Biophysics
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the key problems in protein structure prediction and dynamic simulation. Specifically: 1. **The gap between static structure and dynamic behavior**: - Although existing protein structure prediction methods (such as AlphaFold2) have made revolutionary progress in predicting a single static structure, they mainly focus on predicting a single experimental - level structure from the amino acid sequence. However, the function of a protein depends not only on its static structure but also on its conformational distribution and dynamic behavior in solution. - The paper points out that understanding the dynamic characteristics of proteins is crucial for revealing their biological functions. Therefore, a method that can effectively sample the conformational distribution of proteins and simulate their dynamic behavior is needed. 2. **Learning energy landscapes and conformational dynamics from experimental structures**: - The authors assume that the energy landscapes and conformational dynamics of proteins can be learned from the experimental structures in the PDB (Protein Data Bank) and co - evolution data. This means that without relying on computationally intensive molecular dynamics (MD) simulation data, the conformational flexibility and dynamic characteristics of proteins can also be captured. - To this end, they developed a diffusion - generation model named DeepConformer, which can sample the conformational distribution of proteins from a given amino acid sequence. 3. **Improving the sampling ability for complex conformational changes**: - In order to enhance the model's ability to learn the protein energy landscape, DeepConformer introduces a variety of techniques, such as correlating different structures, masking a large proportion (50 - 70%) of amino acid positions on a large scale, and using multiple - sequence alignment (MSA) clustering methods. - These techniques enable DeepConformer not only to generate conformations close to known structures but also to explore the transition paths of proteins between different energy basins, thereby better simulating complex conformational changes. ### Summary The core problem of the paper is: how to use deep - learning methods, especially diffusion - generation models, to learn the energy landscapes and conformational dynamics of proteins from experimental structures and co - evolution data, so as to make up for the deficiencies of existing static - structure prediction methods in dynamic - behavior simulation. Through this method, researchers hope to more comprehensively understand the dynamic characteristics of proteins and their biological functions.