Topic Optimization Method Based on Pointwise Mutual Information

Yuxin Ding,Shengli Yan
DOI: https://doi.org/10.1007/978-3-319-26555-1_17
2015-01-01
Abstract:Latent Dirichlet Allocation (LDA) model is biased to draw high-frequency words to describe topics. This affects the accuracy of the representation of topics. To solve this issue, we use point-wise mutual information (PMI) to estimate the internal correlation between words and documents and propose the LDA model based on PMI. The proposed model draws words in a topic according to the mutual information. We also propose three measures to evaluate the quality of topics, which are readability, consistency of topics, and similarity of topics. The experimental results show that the quality of the topics generated by the proposed topic model is better than that of the LDA model.
What problem does this paper attempt to address?