PLDA+: Parallel latent dirichlet allocation with data placement and pipeline processing

Zhiyuan Liu,Yuzhou Zhang,Edward Y. Chang,Maosong Sun
DOI: https://doi.org/10.1145/1961189.1961198
2011-01-01
Abstract:Previous methods of distributed Gibbs sampling for LDA run into either memory or communication bottlenecks. To improve scalability, we propose four strategies: data placement, pipeline processing, word bundling, and priority-based scheduling. Experiments show that our strategies significantly reduce the unparallelizable communication bottleneck and achieve good load balancing, and hence improve scalability of LDA.
What problem does this paper attempt to address?