Document Clustering Based on Probabilistic Topic Model

WANG Li-dong,WEI Bao-gang,YUAN Jie
DOI: https://doi.org/10.3969/j.issn.0372-2112.2012.11.033
2012-01-01
Abstract:To effectively cluster corpus of ordinary documents and digital books,the clustering algorithms based on LDA model and TC-LDA were proposed,respectively.The topic model named TC-LDA,the extension of LDA,is proposed for digital books corpus for jointly topic modeling from both of Texts and Contents.Unlike traditional clustering methods,topic model based methods cluster documents in a group if they share one or more common topics.Empirical evaluation demonstrates that our approach based on topic analysis can substantially improve the clustering results as compared to related methods.
What problem does this paper attempt to address?