Representing Graphs Via Gromov-Wasserstein Factorization

Hongteng Xu,Jiachang Liu,Dixin Luo,Lawrence Carin
DOI: https://doi.org/10.1109/tpami.2022.3153126
IF: 23.6
2023-01-01
IEEE Transactions on Pattern Analysis and Machine Intelligence
Abstract:Graph representation is a challenging and significant problem for many real-world applications. In this work, we propose a novel paradigm called “Gromov-Wasserstein Factorization” (GWF) to learn graph representations in a flexible and interpretable way. Given a set of graphs, whose correspondence between nodes is unknown and whose sizes can be different, our GWF model reconstructs each graph by a weighted combination of some “graph factors” under a pseudo-metric called Gromov-Wasserstein (GW) discrepancy. This model leads to a new nonlinear factorization mechanism of the graphs. The graph factors are shared by all the graphs, which represent the typical patterns of the graphs’ structures. The weights associated with each graph indicate the graph factors’ contributions to the graph's reconstruction, which lead to a permutation-invariant graph representation. We learn the graph factors of the GWF model and the weights of the graphs jointly by minimizing the overall reconstruction error. When learning the model, we reparametrize the graph factors and the weights to unconstrained model parameters and simplify the backpropagation of gradient with the help of the envelope theorem. For the GW discrepancy (the critical training step), we consider two algorithms to compute it, which correspond to the proximal point algorithm (PPA) and Bregman alternating direction method of multipliers (BADMM), respectively. Furthermore, we propose some extensions of the GWF model, including (i) combining with a graph neural network and learning graph representations in an auto-encoding manner, (ii) representing the graphs with node attributes, and (iii) working as a regularizer for semi-supervised graph classification. Experiments on various datasets demonstrate that our GWF model is comparable to the state-of-the-art methods. The graph representations derived by it perform well in graph clustering and classification tasks.
What problem does this paper attempt to address?