Topic Models Incorporating Statistical Word Senses

Guoyu Tang,Yunqing Xia,Jun Sun,Min Zhang,Thomas Fang Zheng
DOI: https://doi.org/10.1007/978-3-642-54906-9_13
2014-01-01
Abstract:LDA considers a surface word to be identical across all documents and measures the contribution of a surface word to each topic. However, a surface word may present different signatures in different contexts, i.e. polysemous words can be used with different senses in different contexts. Intuitively, disambiguating word senses for topic models can enhance their discriminative capabilities. In this work, we propose a joint model to automatically induce document topics and word senses simultaneously. Instead of using some pre-defined word sense resources, we capture the word sense information via a latent variable and directly induce them in a fully unsupervised manner from the corpora. Experimental results show that the proposed joint model outperforms the classic LDA and a standalone sense-based LDA model significantly in document clustering.
What problem does this paper attempt to address?