CuLDA_CGS

Xiaolong Xie,Yun Liang,Xiuhong Li,Wei Tan
DOI: https://doi.org/10.1145/3293883.3301496
2019-01-01
Abstract:GPUs have benefited many ML algorithms. However, we observe that the performance of existing Latent Dirichlet Allocation(LDA) solutions on GPUs are not satisfying. We present CuLDA_CGS, an efficient approach to accelerate large-scale LDA problems. We delicately design workload partition and synchronization mechanism to exploit multiple GPUs. We also optimize the algorithm from the sampling algorithm, parallelization, and data compression perspectives. Experiment evaluations show that compared with the state-of-the-art LDA solutions, CuLDA_CGS outperforms them by a large margin (up to 7.3X) on a single GPU.
What problem does this paper attempt to address?