Sparse Hybrid Variational-Gibbs Algorithm for Latent Dirichlet Allocation.

Ximing Li,Jihong Ouyang,Xiaotang Zhou
DOI: https://doi.org/10.1137/1.9781611974348.82
2016-01-01
Abstract:Topic modeling algorithms such as the latent Dirichlet allocation (LDA) play an important role in machine learning research. Fitting LDA using Gibbs sampler-related algorithms involves a sampling process over K topics. We can use the sparsity in LDA to accelerate this expensive topic sampling process even for very large K values. However, LDA gradually loses sparsity as the number of documents increases. Motivated by the goal of fast LDA inference with large numbers of both topics and documents, in this paper we propose the novel sparse hybrid variational-Gibbs (SHVG) algorithm. The SHVG algorithm divides the topic sampling probability into a sparse term that scales linearly with the number of per-document instantiated topics Kd, and a dense term that uses the Alias method to reduce the time cost to constant O(1) time. This will lead to a significant improvement on efficiency. Using stochastic optimization techniques, we further develop an online version of SHVG for streaming documents. Experimental results on corpora with a wide range of sizes demonstrate the efficiency and effectiveness of the proposed SHVG algorithm.MSC codesLDASparse samplingAlias methodOnline inference
What problem does this paper attempt to address?