Disambiguated Node Classification with Graph Neural Networks

Tianxiang Zhao,Xiang Zhang,Suhang Wang
DOI: https://doi.org/10.1145/3589334.3645637
2024-02-14
Abstract:Graph Neural Networks (GNNs) have demonstrated significant success in learning from graph-structured data across various domains. Despite their great successful, one critical challenge is often overlooked by existing works, i.e., the learning of message propagation that can generalize effectively to underrepresented graph regions. These minority regions often exhibit irregular homophily/heterophily patterns and diverse neighborhood class distributions, resulting in ambiguity. In this work, we investigate the ambiguity problem within GNNs, its impact on representation learning, and the development of richer supervision signals to fight against this problem. We conduct a fine-grained evaluation of GNN, analyzing the existence of ambiguity in different graph regions and its relation with node positions. To disambiguate node embeddings, we propose a novel method, {\method}, which exploits additional optimization guidance to enhance representation learning, particularly for nodes in ambiguous regions. {\method} identifies ambiguous nodes based on temporal inconsistency of predictions and introduces a disambiguation regularization by employing contrastive learning in a topology-aware manner. {\method} promotes discriminativity of node representations and can alleviating semantic mixing caused by message propagation, effectively addressing the ambiguity problem. Empirical results validate the efficiency of {\method} and highlight its potential to improve GNN performance in underrepresented graph regions.
Machine Learning,Social and Information Networks
What problem does this paper attempt to address?
This paper aims to address the representation ambiguity problem faced by Graph Neural Networks (GNNs) when dealing with graph regions exhibiting different patterns of homophily and heterophily. Specifically, the paper focuses on the learning mechanisms of GNNs that may cause node representations to become unclear in some uncommon graph structure regions, especially in those with irregular structures or insufficient samples. The authors propose a new method called DisamGCL, which improves node classification performance in these ambiguous regions by enhancing the supervision signal. DisamGCL leverages a contrastive learning framework to identify ambiguous nodes by analyzing the temporal consistency of prediction results and introduces a disambiguation regularization objective to enhance the distinctiveness of node representations, thereby effectively addressing this issue. Experiments demonstrate that DisamGCL can significantly improve the performance of GNNs in these challenging regions.