A New Text Clustering Method Using Hidden Markov Model

Yan Fu,Dongqing Yang,Shiwei Tang,Tengjiao Wang,Aiqiang Gao
DOI: https://doi.org/10.1007/978-3-540-73351-5_7
2007-01-01
Abstract:Being high-dimensional and relevant in semantics, text clustering is still an important topic in data mining. However, little work has been done to investigate attributes of clustering process, and previous studies just focused on characteristics of text itself. As a dynamic and sequential process, we aim to describe text clustering as state transitions for words or documents. Taking K-means clustering method as example, we try to parse the clustering process into several sequences. Based on research of sequential and temporal data clustering, we propose a new text clustering method using HMM(Hidden Markov Model). And through the experiments on Reuters-21578, the results show that this approach provides an accurate clustering partition, and achieves better performance rates compared with K-means algorithm.
What problem does this paper attempt to address?