Topic Detection Based On Semantics, Time And Social Relationship

Pengchao Cheng,Junping Du,Feifei Kou,Zhe Xue,Peihua Chen
DOI: https://doi.org/10.1007/978-981-32-9050-1_78
2020-01-01
Abstract:Short text sparsity, oral language, and polysemy are the main problems when dealing with social network data, which make the traditional methods hard to obtain the true meaning of social network data. Due to the above issues, topic detection for social network data is not that easy. And to solve the above problems, we propose an original Clustering Algorithm based on Semantics, Time, and Social relationship (CASTS) for topic detection. Firstly, to overcome short text sparsity and polysemy problems, the CASTS leverages the Bidirectional Encoder Representations from Transformers (BERT), which can pre-train on large-scale social network short text data to obtain concise text representation with rich semantics. Secondly, by combining the short text representation, time, and social relationship, the CASTS can efficiently detect topics. Finally, we conduct experiments on Weibo dataset to verify the correctness and effectiveness of CASTS.
What problem does this paper attempt to address?