ITWF: A Framework to Apply Term Weighting Schemes in Topic Model

Kai Yang,Yi Cai,Ho-fung Leung,Raymond Y. K. Lau,Qing Li
DOI: https://doi.org/10.1016/j.neucom.2019.02.048
IF: 6
2019-01-01
Neurocomputing
Abstract:Topic models like Latent Dirichlet Allocation (LDA) and its variants is a type of statistical model for discovering latent topics. However, as revealed by the previous research, some topics generated by LDA may be uninterpretable and semantically incoherent due to the occurrence of irrelevant words in these topics. To improve the semantic qualities of automatically discovered topics, we explore the distributional characteristics of words across topics to identify topic-indiscriminate words which are blamed for the low-quality topics. The main contribution of our research reported in this paper is that we develop a novel framework named Iterative Term Weighting Framework (ITWF) which can effectively identify and filter out topic-indiscriminate words from uncovered topics. In particular, the proposed framework first applies an entropy-based term weighting schemes and adopts a novel iterative method to identify topic-indiscriminate words. To the best of our knowledge, our research is among the very few successful work that aims to enhance both the semantic coherence and the interpretability of LDA-based topic modeling methods. The experimental results show that the proposed framework improves the effectiveness of LDA as well as its variants.
What problem does this paper attempt to address?