DiAMoNDBack: Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping of Cα Protein Traces

Michael S. Jones,Kirill Shmilovich,Andrew L. Ferguson
2023-07-24
Abstract:Coarse-grained molecular models of proteins permit access to length and time scales unattainable by all-atom models and the simulation of processes that occur on long-time scales such as aggregation and folding. The reduced resolution realizes computational accelerations but an atomistic representation can be vital for a complete understanding of mechanistic details. Backmapping is the process of restoring all-atom resolution to coarse-grained molecular models. In this work, we report DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping) as an autoregressive denoising diffusion probability model to restore all-atom details to coarse-grained protein representations retaining only C{\alpha} coordinates. The autoregressive generation process proceeds from the protein N-terminus to C-terminus in a residue-by-residue fashion conditioned on the C{\alpha} trace and previously backmapped backbone and side chain atoms within the local neighborhood. The local and autoregressive nature of our model makes it transferable between proteins. The stochastic nature of the denoising diffusion process means that the model generates a realistic ensemble of backbone and side chain all-atom configurations consistent with the coarse-grained C{\alpha} trace. We train DiAMoNDBack over 65k+ structures from Protein Data Bank (PDB) and validate it in applications to a hold-out PDB test set, intrinsically-disordered protein structures from the Protein Ensemble Database (PED), molecular dynamics simulations of fast-folding mini-proteins from DE Shaw Research, and coarse-grained simulation data. We achieve state-of-the-art reconstruction performance in terms of correct bond formation, avoidance of side chain clashes, and diversity of the generated side chain configurational states. We make DiAMoNDBack model publicly available as a free and open source Python package.
Biomolecules,Machine Learning
What problem does this paper attempt to address?
The paper aims to address the issue of loss of atomic details in coarse-grained models of proteins during molecular dynamics simulations, especially when full atomic details are needed for in-depth mechanistic analysis. The authors propose a new method called DiAMoNDBack (Diffusion-denoising Autoregressive Model for Non-Deterministic Backmapping), which is an autoregressive generative model based on the Diffusion Denoising Probabilistic Model (DDPM) to recover full atomic details from coarse-grained protein models that only contain Cα atom trajectories. Specifically, DiAMoNDBack aims to generate the full atomic structure of a protein step-by-step and autoregressively while preserving the Cα coordinates, processing each residue from the N-terminus to the C-terminus. This method can generate a set of physically plausible all-atom conformations consistent with the original coarse-grained Cα trajectory. Due to its locality and autoregressive nature, the model can be transferred between different proteins, and because of the stochastic nature of the diffusion process, it can generate multiple sets of all-atom conformations that conform to the original coarse-grained Cα trajectory. To validate the effectiveness of DiAMoNDBack, the authors trained it using over 65,000 structures from the Protein Data Bank (PDB) and validated it on a reserved PDB test set, the Protein Ensemble Database (PED) of intrinsically disordered protein structures, molecular dynamics simulation data of fast-folding small proteins, and specific coarse-grained simulation data. The results show that DiAMoNDBack achieves state-of-the-art performance in reconstruction, excelling in correctly forming chemical bonds, avoiding side-chain spatial clashes, and generating diverse side-chain conformational states. However, a drawback of DiAMoNDBack is its relatively slow generation process, which is much slower than other methods like GenZProt and PULCHRA. Nevertheless, the authors have made DiAMoNDBack available to the research community as a free and open-source Python package.