Sifting through the Noise: A Survey of Diffusion Probabilistic Models and Their Applications to Biomolecules

Trevor Norton,Debswapna Bhattacharya
2024-06-01
Abstract:Diffusion probabilistic models have made their way into a number of high-profile applications since their inception. In particular, there has been a wave of research into using diffusion models in the prediction and design of biomolecular structures and sequences. Their growing ubiquity makes it imperative for researchers in these fields to understand them. This paper serves as a general overview for the theory behind these models and the current state of research. We first introduce diffusion models and discuss common motifs used when applying them to biomolecules. We then present the significant outcomes achieved through the application of these models in generative and predictive tasks. This survey aims to provide readers with a comprehensive understanding of the increasingly critical role of diffusion models.
Biomolecules,Artificial Intelligence,Machine Learning,Quantitative Methods
What problem does this paper attempt to address?
This paper mainly focuses on the current status and progress of diffusion probability models in the application of biological molecules. These models were initially introduced in the field of image generation and have since been widely applied in fields such as protein structure prediction, computer vision, audio generation, and robotics. As a type of deep generative model, diffusion models transform high-dimensional distributions into manageable priors and then progressively remove noise to achieve sampling, making them particularly suitable for handling high-dimensional rough distributions. In the field of biological molecules, traditional computational methods are difficult to solve due to the high dimensionality and complex distributions of the data, such as the challenges faced in protein folding. Diffusion models utilize scalable deep learning architectures and iterative denoising processes to effectively address these problems. In recent years, diffusion models have achieved a series of advancements in protein structure generation and design, as exemplified by the high-accuracy performance of AlphaFold2 in predicting protein folding structures. The paper first introduces the history, theoretical foundations, and common application technologies of diffusion models, particularly focusing on diffusion models for biological molecules. Then, it discusses key concepts such as controlled generation, equivariant/invariant score networks, and diffusion on manifolds. Finally, the paper provides an overview of the applications of diffusion models in molecular generation, prediction tasks (such as protein structure prediction, ligand-protein docking), and presents a timeline of related research. In conclusion, this paper aims to provide readers with a comprehensive understanding of the role of diffusion models in the study of biological molecules, emphasizing their advantages and potential limitations, in order to promote further research in this field.