Nonparametric Symmetric Correspondence Topic Models for Multilingual Text Analysis.

Rui Cai,Miaohong Chen,Houfeng Wang
DOI: https://doi.org/10.1007/978-3-319-25207-0_23
2015-01-01
Abstract:Topic model aims to analyze collection of documents and has been widely used in the fields of machine learning and natural language processing. Recently, researchers proposed some topic models for multilingual parallel or comparable documents. The symmetric correspondence Latent Dirichlet Allocation SymCorrLDA is one such model. Despite its advantages over some other existing multilingual topic models, this model is a classic Bayesian parametric model, thus can't overcome the shortcoming of Bayesian parametric models. For example, the number of topics must be specified in advance. Based on this intuition, we extend this model and propose a Bayesian nonparametric model NPSymCorrLDA. Experiments on Chinese-English datasets extracted from Wikipediahttps://zh.wikipedia.org/ show significant improvement over SymCorrLDA.
What problem does this paper attempt to address?