Design and Development of Partitional Topic Model

Kaiwen ZHOU,Zhihui YANG,Huixin MA,Zhenying HE,Yinan JING,X. WANG
DOI: https://doi.org/10.3778/j.issn.1673-9418.1709034
2018-01-01
Abstract:It's prevalent to use topic model to analyze documents in data mining at present. LDA (latent Dirichlet allocation), as a simple topic model, has received much attention. However, LDA assumes the generating process of each document to be independent, which neglects the connection between documents. By modeling the connection between documents, this paper develops a new topic model DbLDA (LDA over text database). DbLDA explores the partitional structure of text databases (e.g., time, location), utilizes the commonalities inside each subset and thus is more expressive than original LDA. Due to the complexity of DbLDA, this paper uses partial collapsed variational Bayesian method to perform the model inference task, which has a fast training speed. For experiments, this paper trains DbLDA and LDA on news datasets. The experimental results justify that DbLDA yields a better performance than LDA.
What problem does this paper attempt to address?