Contrastive author-aware text clustering

Xudong Tang,Chao Dong,Wei Zhang
DOI: https://doi.org/10.1016/j.patcog.2022.108787
IF: 8
2022-01-01
Pattern Recognition
Abstract:In the era of User Generated Content (UGC), authors (IDs) of texts widely exist and play a key role in determining the topic categories of texts. Existing text clustering effort s are mainly attributed to utiliz-ing textual information, but the effect of authors on text clustering remains largely underexplored. To mitigate this issue, we propose a novel Contrastive Author-aware Text clustering approach, dubbed as CAT. CAT injects author information not only in characterizing texts through representations but also in pushing or pulling text representations of different authors through contrastive learning, which is rarely adopted by text clustering. Specifically, the developed contrastive learning method conducts both cluster-instance contrast by the text representation augmentation and instance-instance contrast by the multi-view representations. We perform comprehensive experiments on three public datasets, demonstrating that CAT largely outperforms strong competitive text clustering baselines and validating the effectiveness of the CAT's main components. (c) 2022 Elsevier Ltd. All rights reserved.
What problem does this paper attempt to address?