TSDPMM: Incorporating Prior Topic Knowledge into Dirichlet Process Mixture Models for Text Clustering.

Linmei Hu,Juanzi Li,Xiaoli Li,Chao Shao,Xuzhong Wang
DOI: https://doi.org/10.18653/v1/d15-1091
2015-01-01
Abstract:Dirichlet process mixture model (DPMM) has great potential for detecting the underlying structure of data. Extensive studies have applied it for text clustering in terms of topics. However, due to the unsupervised nature, the topic clusters are always less satisfactory. Considering that people often have some prior knowledge about which potential topics should exist in given data, we aim to incorporate such knowledge into the DPMM to improve text clustering. We propose a novel model TSDPMM based on a new seeded P´ olya urn scheme. Experimental results on document clustering across three datasets demonstrate our proposed TSDPMM significantly outperforms stateof-the-art DPMM model and can be applied in a lifelong learning framework.
What problem does this paper attempt to address?