Graph Community Augmentation with GMM-based Modeling in Latent Space

Shintaro Fukushima,Kenji Yamanishi
2024-12-02
Abstract:This study addresses the issue of graph generation with generative models. In particular, we are concerned with graph community augmentation problem, which refers to the problem of generating unseen or unfamiliar graphs with a new community out of the probability distribution estimated with a given graph dataset. The graph community augmentation means that the generated graphs have a new community. There is a chance of discovering an unseen but important structure of graphs with a new community, for example, in a social network such as a purchaser network. Graph community augmentation may also be helpful for generalization of data mining models in a case where it is difficult to collect real graph data enough. In fact, there are many ways to generate a new community in an existing graph. It is desirable to discover a new graph with a new community beyond the given graph while we keep the structure of the original graphs to some extent for the generated graphs to be realistic. To this end, we propose an algorithm called the graph community augmentation (GCA). The key ideas of GCA are (i) to fit Gaussian mixture model (GMM) to data points in the latent space into which the nodes in the original graph are embedded, and (ii) to add data points in the new cluster in the latent space for generating a new community based on the minimum description length (MDL) principle. We empirically demonstrate the effectiveness of GCA for generating graphs with a new community structure on synthetic and real datasets.
Machine Learning,Information Theory
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper is mainly dedicated to solving the **Graph Community Augmentation (GCA) problem**. Specifically, the research objective is to generate unseen or unfamiliar graphs with new community structures, which are generated from the probability distribution estimated in the given dataset. #### Definition of the Graph Community Augmentation problem The Graph Community Augmentation problem refers to generating new graphs with new community structures in the existing graph dataset through the estimated probability distribution. Different from traditional graph data augmentation methods, graph community augmentation does not merely modify the existing graph structure (such as node attributes, labels, or edge connections), but generates completely new community structures. For example, in a social network, this may mean discovering new groups of purchasers or knowledge clusters. #### Why is the Graph Community Augmentation problem important? 1. **Discover potentially important structures**: By generating graphs with new community structures, it is possible to discover important graph structures that have not appeared in the dataset. For example, in a purchaser network, new communities may indicate new categories or groups that may emerge in the future. 2. **Generalization of data - mining models**: In cases where it is difficult to collect sufficient real - graph data, graph community augmentation helps improve the generalization ability of data - mining models. By generating graphs with new community structures, the training data can be supplemented, thereby improving the robustness and adaptability of the model. 3. **Maintaining the realism of the original graph structure**: When generating new graphs, maintaining the basic structure of the original graph makes the generated graphs more in line with the real - world situation. #### Main contributions of the research 1. **Proposing a new formulation of the graph community augmentation problem**: Embed the original graph into the latent space and represent it with a Gaussian Mixture Model (GMM). Each cluster in the GMM can correspond to a sub - structure (such as a community) in the original graph. Graph community augmentation is achieved by decoding new clusters. 2. **An algorithm based on the Minimum Description Length (MDL) principle**: A specific algorithm is proposed to implement the above idea. This algorithm includes a training phase and a community augmentation phase. In the training phase, an auto - encoder is used to embed the original graph into the latent space, and the number of clusters of the GMM is selected according to the MDL principle; in the community augmentation phase, new clusters are added considering the novelty and reliability conditions. 3. **Experimental proof of the effectiveness of the algorithm**: Experiments were carried out on synthetic datasets and real - world datasets, proving the effectiveness of the proposed GCA algorithm in generating new community structure graphs. ### Summary This paper aims to solve the graph community augmentation problem, that is, generating new graphs with new community structures from the given dataset. By combining auto - encoders, GMMs, and the MDL principle, an effective algorithm is proposed, and its performance is verified through experiments. This research not only helps to discover potentially important graph structures but also provides new ideas for the generalization of data - mining models.