High Performance LDA Through Collective Model Communication Optimization

Bingjing Zhang,Bo Peng,Judy Qiu
DOI: https://doi.org/10.1016/j.procs.2016.05.300
2016-01-01
Procedia Computer Science
Abstract:LDA is a widely used machine learning technique for big data analysis. The application includes an inference algorithm that iteratively updates a model until it converges. A major challenge is the scaling issue in parallelization owing to the fact that the model size is huge and parallel workers need to communicate the model continually. We identify three important features of the model in parallel LDA computation: 1. The volume of model parameters required for local computation is high; 2. The time complexity of local computation is proportional to the required model size; 3. The model size shrinks as it converges. By investigating collective and asynchronous methods for model communication in different tools, we discover that optimized collective communication can improve the model update speed, thus allowing the model to converge faster. The performance improvement derives not only from accelerated communication but also from reduced iteration computation time as the model size shrinks during the model convergence. To foster faster model convergence, we design new collective communication abstractions and implement two Harp-LDA applications, lgs and rtt. We compare our new approach with Yahoo! LDA and Petuum LDA, two leading implementations favoring asynchronous communication methods in the field, on a 100-node, 4000-thread Intel Haswell cluster. The experiments show that lgs can reach higher model likelihood with shorter or similar execution time compared with Yahoo! LDA, while rtt can run up to 3.9 times faster compared with Petuum LDA when achieving similar model likelihood.
What problem does this paper attempt to address?