Boosting Pseudo-Labeling With Curriculum Self-Reflection for Attributed Graph Clustering

Pengfei Zhu,Jialu Li,Yu Wang,Bin Xiao,Jinglin Zhang,Wanyu Lin,Qinghua Hu
DOI: https://doi.org/10.1109/TNNLS.2024.3416167
2024-09-09
Abstract:Attributed graph clustering is an unsupervised learning task that aims to partition various nodes of a graph into distinct groups. Existing approaches focus on devising diverse pretext tasks to obtain suitable supervised information for representation learning, among which the predictive methods show great potential. However, these methods 1) generate auxiliary task bias toward the clustering target and 2) introduce label noise due to static thresholds. To address this issue, we propose a new self-supervised learning method, namely, pseudo-labeling with curriculum self-reflection (PLCSR), that learns reliable pseudo-labels by mining its information to achieve progressive processing of nodes in a self-reflection manner. First, a self-auxiliary encoder is constructed using the exponential moving average (EMA) of the original encoder's parameters to replace the auxiliary tasks, which provides an additional perspective of finding highly confident pseudo-labels. Second, a curriculum selection strategy using dynamic thresholds is designed to take full advantage of graph nodes more accurately. Besides simple nodes with high confidence at the initial stage, nodes that yield consistent predictions from both encoders are then assigned pseudo-labels to avoid the under-learning problem. For the rest difficult nodes that are highly uncertain, we abstain from making judgments to minimize their adverse impact on the model. Extensive experiments have shown that PLCSR significantly outperforms the state-of-the-art predictive method CDRS, achieving more than 6% improvements in terms of clustering accuracy. The code is available at: https://github.com/Jillian555/PLCSR.
What problem does this paper attempt to address?