AGSEI: Adaptive Graph Structure Estimation with Long-Tail Distributed Implicit Graphs
Yunfei He,Yang Wu,Lishan Huang,Zhenwan Peng,Fei Yang,Yiwen Zhang,Victor S Sheng
DOI: https://doi.org/10.1109/tetc.2024.3480132
2024-01-01
IEEE Transactions on Emerging Topics in Computing
Abstract:Empowered by their remarkable advantages, graph neural networks (GNN) serve as potent tools for embedding graph-structured data and finding applications across various domains. Particularly, a prevalent assumption in most GNNs is the reliability of the underlying graph structure. This assumption, often implicit, can inadvertently lead to the propagation of misleading information through structures like false links. In response to this challenge, numerous methods for graph structure learning (GSL) have been developed. Among these methods, one popular approach is to construct a simple and intuitive K-nearest neighbor (KNN) graph as a sample to infer true graph structure. However, KNN graphs that follow the single-point distribution can easily mislead the true graph structure estimation. The primary reason is that, from a statistical perspective, the KNN graph, as a sample, follows a single-point distribution, whereas the true graph structure, as the population, as a whole mostly follows a long-tail distribution. In theory, the sample and the population should share the same distribution; otherwise, accurately inferring the true graph structure becomes challenging. To address this problem, this paper proposes an Adaptive Graph Structure Estimation with Long-Tail Distributed Implicit Graph, referred to as AGSEI. AGSEI comprises three main components: long-tail implicit graph construction, explicit graph structure estimation, and joint optimization. The first component relies on a multi-layer graph convolutional network to learn low-order to high-order node representations, compute node similarity, and construct several corresponding long-tail implicit graphs. Since the original imperfect graph structure can mislead GNNs into propagating false information, it reduces the reliability of the long-tail implicit graphs. AGSEI attempts to limit the aggregation of irrelevant information by introducing the Hilbert-Schmidt independence criterion. That is, maximizing the dependence between the predicted label and ground truth. With this strategy, AGSEI can learn node features dependent on labels to facilitate the construction of reliable long-tail implicit graphs, and then provide adaptive multi-view graph structure information to support subsequent GSL. In the second component, the graph structure is estimated using the stochastic block model (SBM) with the Expectation-Maximization algorithm. Considering that it is difficult for a single GSL to approach the true graph structure, the third part considers the joint optimization of the long-tail implicit graph construction and the explicit graph structure estimation. This involves optimizing the two parts alternately until the model converges. We conducted multiple experiments on five public datasets, including tasks such as classification and clustering. These experiments not only demonstrated the performance of AGSEI but also confirmed that the graph structures it estimates align with the long-tail distribution.