Scalable Weibull Graph Attention Autoencoder for Modeling Document Networks

Chaojie Wang,Xinyang Liu,Dongsheng Wang,Hao Zhang,Bo Chen,Mingyuan Zhou
2024-10-13
Abstract:Although existing variational graph autoencoders (VGAEs) have been widely used for modeling and generating graph-structured data, most of them are still not flexible enough to approximate the sparse and skewed latent node representations, especially those of document relational networks (DRNs) with discrete observations. To analyze a collection of interconnected documents, a typical branch of Bayesian models, specifically relational topic models (RTMs), has proven their efficacy in describing both link structures and document contents of DRNs, which motives us to incorporate RTMs with existing VGAEs to alleviate their potential issues when modeling the generation of DRNs. In this paper, moving beyond the sophisticated approximate assumptions of traditional RTMs, we develop a graph Poisson factor analysis (GPFA), which provides analytic conditional posteriors to improve the inference accuracy, and extend GPFA to a multi-stochastic-layer version named graph Poisson gamma belief network (GPGBN) to capture the hierarchical document relationships at multiple semantic levels. Then, taking GPGBN as the decoder, we combine it with various Weibull-based graph inference networks, resulting in two variants of Weibull graph auto-encoder (WGAE), equipped with model inference algorithms. Experimental results demonstrate that our models can extract high-quality hierarchical latent document representations and achieve promising performance on various graph analytic tasks.
Machine Learning
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to solve the problem of insufficient flexibility in existing Variational Graph Auto - Encoders (VGAEs) when modeling and generating graph - structured data, especially in dealing with sparse and skewed latent node representations with discrete observations in Document Relationship Networks (DRNs). Specifically: 1. **Limitations of existing VGAEs**: - Most existing VGAEs rely on the re - parameterization assumption of the Gaussian distribution to construct latent node representations, which often cannot well approximate sparse and skewed latent node representations, especially for Document Relationship Networks (DRNs) with discrete observations. - These methods are usually difficult to interpret the latent semantic structures learned from network data, such as the underlying topic connections between documents. 2. **Advantages of introducing RTMs**: - Relational Topic Models (RTMs) have been proven to be very effective in describing document content and link structures and can jointly model document content and their relationships. - RTMs can explain each link through shared topic selection, thus enabling interpretable link - structure prediction. 3. **Improved models**: - The paper proposes a new RTM based on the Poisson - Gamma distribution, called Graph Poisson Factor Analysis (GPFA), which can provide analytical conditional posterior distributions to improve inference accuracy. - Further extend GPFA to the Graph Poisson Gamma Belief Network (GPGBN) with multiple random hidden layers to capture document relationships at multiple semantic levels. - Combine the Graph Inference Network (encoder) based on the Weibull distribution and GPGBN (decoder) to propose two variants of the Weibull Graph Auto - Encoder (WGAE) to overcome the limitations of traditional Gaussian - based VGAEs and can be flexibly applied to various graph analysis tasks. 4. **Specific objectives**: - Extract high - quality hierarchical latent document representations. - Achieve excellent performance in various graph analysis tasks. - Realize interpretable multi - semantic - level link - structure prediction. Through these improvements, the paper solves the limitations of existing VGAEs in dealing with sparse and skewed latent node representations and interpreting latent semantic structures, especially in the application of Document Relationship Networks.