A Fast Algorithm for Posterior Inference with Latent Dirichlet Allocation

Bui Thi-Thanh-Xuan,Vu Van-Tu,Atsuhiro Takasu,Khoat Than
DOI: https://doi.org/10.1007/978-3-319-75420-8_13
2018-01-01
Abstract:Latent Dirichlet Allocation (LDA) [1], among various forms of topic models, is an important probabilistic generative model for analyzing large collections of text corpora. The problem of posterior inference for individual texts is very important in streaming environments, but is often intractable in the worst case. To avoid directly solving this problem which is NP-hard, some proposed existing methods for posterior inference are approximate but do not have any guarantee on neither quality nor convergence rate. Based on the idea of Online Frank-Wolfe algorithm by Hazan [2] and improvement of Online Maximum a Posteriori Estimation algorithm (OPE) by Than [3, 4], we propose a new effective algorithm (so-called NewOPE) solving posterior inference in topic models by combining Bernoulli distribution, stochastic bounds, and approximation function. Our algorithm has more attractive properties than existing inference approaches, including theoretical guarantees on quality and fast convergence rate. It not only maintains the key advantages of OPE but often outperforms OPE and existing algorithms before. Our new algorithm has been employed to develop two effective methods for learning topic models from massive/streaming text collections. Experimental results show that our approach is more efficient and robust than the state-of-the-art methods.
What problem does this paper attempt to address?