Generative Diffusion Models on Graphs: Methods and Applications

Chengyi Liu,Wenqi Fan,Yunqing Liu,Jiatong Li,Hang Li,Hui Liu,Jiliang Tang,Qing Li
2023-08-26
Abstract:Diffusion models, as a novel generative paradigm, have achieved remarkable success in various image generation tasks such as image inpainting, image-to-text translation, and video generation. Graph generation is a crucial computational task on graphs with numerous real-world applications. It aims to learn the distribution of given graphs and then generate new graphs. Given the great success of diffusion models in image generation, increasing efforts have been made to leverage these techniques to advance graph generation in recent years. In this paper, we first provide a comprehensive overview of generative diffusion models on graphs, In particular, we review representative algorithms for three variants of graph diffusion models, i.e., Score Matching with Langevin Dynamics (SMLD), Denoising Diffusion Probabilistic Model (DDPM), and Score-based Generative Model (SGM). Then, we summarize the major applications of generative diffusion models on graphs with a specific focus on molecule and protein modeling. Finally, we discuss promising directions in generative diffusion models on graph-structured data. For this survey, we also created a GitHub project website by collecting the supporting resources for generative diffusion models on graphs, at the link: <a class="link-external link-https" href="https://github.com/ChengyiLIU-cs/Generative-Diffusion-Models-on-Graphs" rel="external noopener nofollow">this https URL</a>
Machine Learning,Artificial Intelligence,Social and Information Networks
What problem does this paper attempt to address?
The problem this paper attempts to address is: how to apply Diffusion Models to graph-structured data to generate new graph structures, particularly in the context of molecular and protein modeling. ### Background and Motivation - **Importance of Graph Generation Tasks**: Graph generation is a crucial task in graph computing with extensive practical applications, such as drug discovery and materials science. The goal is to learn from a given graph distribution and generate new graph structures. - **Limitations of Existing Methods**: - **Variational Autoencoders (VAE)**: Difficulties in posterior estimation when generating large-scale graphs and the need for expensive computations to achieve permutation invariance. - **Generative Adversarial Networks (GAN)**: Prone to mode collapse and requires additional computation to train the discriminator. - **Normalizing Flows**: Due to architectural constraints, it is challenging to fully learn the structural information of graphs. ### Advantages of Diffusion Models - **Theoretical Foundation**: Diffusion models are based on non-equilibrium thermodynamics theory, utilizing Markov chains for forward and reverse diffusion processes. - **Ease of Handling Probabilistic Parameters**: The probabilistic parameters of diffusion models are easy to handle, leading to significant success in tasks such as image generation, text-to-image translation, and molecular graph modeling. ### Main Contributions of the Paper 1. **Review of Diffusion Models on Graphs**: The paper provides a comprehensive overview of the application of diffusion models on graphs, particularly focusing on three main variants: Score Matching with Langevin Dynamics (SMLD), Denoising Diffusion Probabilistic Models (DDPM), and Score-based Generative Models (SGM). 2. **Key Application Areas**: Highlights the application of diffusion models in molecular generation and protein modeling. 3. **Future Research Directions**: Discusses future research directions for diffusion models on graph-structured data. ### Main Content - **Fundamentals**: Introduces graph representations, existing deep generative models (such as VAE, GAN, and Normalizing Flows), and their applications in graph generation. - **Diffusion Models**: - **SMLD**: Generates graphs by gradually adding Gaussian noise and learning the gradient of the data distribution. - **DDPM**: Perturbs the original data to a standard Gaussian distribution through forward diffusion and then recovers the original data through reverse diffusion. - **SGM**: Extends discrete diffusion steps to continuous time, modeling the diffusion process through Stochastic Differential Equations (SDE). - **Diffusion Models on Graphs**: - **Applications of SMLD on Graphs**: Such as EDP-GNN and ConfGF. - **Applications of DDPM on Graphs**: Such as DiGress and E(3) Equivariant Diffusion Model. - **Applications of SGM on Graphs**: Such as GraphGDP and GDSS. ### Application Examples - **Molecular Modeling**: - **Molecular Conformation Generation**: Such as GeoDiff, DGSM, Torsional Diffusion, and E(3) Equivariant Diffusion Model. - **Molecular Docking**: Predicting interactions between molecules by generating new molecular conformations. ### Conclusion The paper systematically summarizes the application of diffusion models on graph-structured data, particularly in the fields of molecular and protein modeling, and points out future research directions. This provides valuable references for researchers and promotes further development in this field.