Protein structure generation via folding diffusion

Kevin E. Wu,Kevin K. Yang,Rianne van den Berg,Sarah Alamdari,James Y. Zou,Alex X. Lu,Ava P. Amini
DOI: https://doi.org/10.1038/s41467-024-45051-2
IF: 16.6
2024-02-05
Nature Communications
Abstract:Abstract The ability to computationally generate novel yet physically foldable protein structures could lead to new biological discoveries and new treatments targeting yet incurable diseases. Despite recent advances in protein structure prediction, directly generating diverse, novel protein structures from neural networks remains difficult. In this work, we present a diffusion-based generative model that generates protein backbone structures via a procedure inspired by the natural folding process. We describe a protein backbone structure as a sequence of angles capturing the relative orientation of the constituent backbone atoms, and generate structures by denoising from a random, unfolded state towards a stable folded structure. Not only does this mirror how proteins natively twist into energetically favorable conformations, the inherent shift and rotational invariance of this representation crucially alleviates the need for more complex equivariant networks. We train a denoising diffusion probabilistic model with a simple transformer backbone and demonstrate that our resulting model unconditionally generates highly realistic protein structures with complexity and structural patterns akin to those of naturally-occurring proteins. As a useful resource, we release an open-source codebase and trained models for protein structure diffusion.
multidisciplinary sciences
What problem does this paper attempt to address?
This paper aims to address the problem of how to generate novel and physically foldable protein structures through computer computation. Current methods face challenges in directly producing diverse and realistic protein structures from neural networks. The research team proposes a diffusion-based generative model inspired by the natural folding process of proteins, which describes the structure based on the angles between amino acids in the protein backbone and gradually generates stable folding structures through a denoising process from a random unfolded state. This approach reduces the need for complex energy landscape networks and is capable of generating highly realistic protein structures. The paper also mentions that the generated protein structures exhibit similar complexity and structural patterns to natural proteins, which may aid in discovering new therapies for currently incurable diseases. Additionally, they have developed an open-source code repository and training model for protein structure diffusion research.