Using Word Sense As a Latent Variable in LDA Can Improve Topic Modeling.

Yunqing Xia,Guoyu Tang,Huan Zhao,Erik Cambria,Thomas Fang Zheng
DOI: https://doi.org/10.5220/0004889705320537
2014-01-01
Abstract:Since proposed, LDA have been successfully used in modeling text documents. So far, words are the common features to induce latent topic, which are later used in document representation. Observation on documents indicates that the polysemous words can make the latent topics less discriminative, resulting in less accurate document representation. We thus argue that the semantically deterministic word senses can improve quality of the latent topics. In this work, we proposes a series of word sense aware LDA models which use word sense as an extra latent variable in topic induction. Preliminary experiments on benchmark datasets show that word sense can indeed improve topic modeling.
What problem does this paper attempt to address?