A topic-enhanced dirichlet model for short text stream clustering
Kan Liu,Jiarui He,Yu Chen
DOI: https://doi.org/10.1007/s00521-024-09480-w
2024-03-03
Neural Computing and Applications
Abstract:Short text streams, such as social media comments, are continuously generated, making effective clustering methods essential for extracting valuable information. However, existing research fails to address the problem of topic concentration in clustering, which leads to multiple topics being confused in one cluster, making it challenging to summarize the center of clustering. To tackle this issue, this paper proposes a novel topic-enhanced clustering method called TEDM, based on the Dirichlet model. The method uses dynamic clustering, leveraging topic information to improve the sampling of documents and better cluster documents on the same topic. TEDM constructs a dynamic word relation graph to extract topic terms, which is updated with the stream of documents to cope with the dynamic changes in topics. Extensive experimental studies demonstrate that TEDM outperforms state-of-the-art works on multiple real datasets.
computer science, artificial intelligence