Sparse Training of Discrete Diffusion Models for Graph Generation

Yiming Qin,Clement Vignac,Pascal Frossard
2024-05-23
Abstract:Generative graph models struggle to scale due to the need to predict the existence or type of edges between all node pairs. To address the resulting quadratic complexity, existing scalable models often impose restrictive assumptions such as a cluster structure within graphs, thus limiting their applicability. To address this, we introduce SparseDiff, a novel diffusion model based on the observation that almost all large graphs are sparse. By selecting a subset of edges, SparseDiff effectively leverages sparse graph representations both during the noising process and within the denoising network, which ensures that space complexity scales linearly with the number of chosen edges. During inference, SparseDiff progressively fills the adjacency matrix with the selected subsets of edges, mirroring the training process. Our model demonstrates state-of-the-art performance across multiple metrics on both small and large datasets, confirming its effectiveness and robustness across varying graph sizes. It also ensures faster convergence, particularly on larger graphs, achieving a fourfold speedup on the large Ego dataset compared to dense models, thereby paving the way for broader applications.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The problem that this paper attempts to solve is that in generative graph models, since it is necessary to predict the existence or type of edges between all pairs of nodes, the computational complexity grows quadratically, which limits the application of the model on large - scale graphs. Existing scalable models usually reduce the complexity by introducing restrictive assumptions such as clustering structures in the graph, but this limits their scope of application. To overcome these problems, the paper proposes SparseDiff, a new method based on discrete diffusion models. By exploiting the sparsity of the graph, it effectively reduces the spatial complexity, making it grow linearly with the number of selected edges. SparseDiff not only performs well on multiple metrics, but also has a faster convergence speed when dealing with large - scale graphs. Especially on large - scale datasets, such as the Ego dataset, it achieves a four - fold speed improvement compared to dense models. This paves the way for a wider range of applications.