Contrastive graph clustering via enhanced hard sample mining and cluster-guiding

Meng Li,Bo Yang,Tao Xue,Yiguo Zhang,Liangliang Zhou
DOI: https://doi.org/10.1007/s00530-024-01567-7
IF: 3.9
2024-11-28
Multimedia Systems
Abstract:Contrastive graph clustering (CGC) has emerged as a research hotspot in current studies, aiming to leverage the robust representational capability of contrastive learning to improve graph clustering performance. Recent works have shown that CGC can benefit from hard sample mining. However, we observe two primary shortcomings of existing CGC methods that limit further enhancements in clustering performance. Firstly, the widely used contrastive loss mistakenly classifies elements outside the cross-view diagonal as negatives, yielding numerous false negatives. Secondly, without explicit cluster-guiding, learned node embeddings become unsuitable for clustering tasks. To address these issues, we propose a novel CGC method by Enhanced hard sample mining and cluster-guiding (CGCEC). This method generates high-confidence pseudo-labels by clustering node embeddings during network training. Furthermore, we have designed a hard sample debiased mining loss that uses pseudo-labels to remove the false negative samples, repelling hard negatives while attracting hard positives, thus enhancing the discriminability of the learned embeddings. Additionally, we employ the encoder to transform node embeddings into semantic labels, promoting the network to learn node embeddings more suitable to clustering by matching semantic labels with pseudo-labels. To validate CGCEC's effectiveness, we compare it with state-of-the-art graph clustering methods across six benchmark datasets. The experimental results substantiate the efficacy of our method and its superiority over competing approaches.
computer science, information systems, theory & methods
What problem does this paper attempt to address?