Sys-TM: A Fast and General Topic Modeling System
Yingxia Shao,Xupeng Li,Yiru Chen,Lele Yu,Bin Cui
DOI: https://doi.org/10.1109/tkde.2019.2956518
IF: 9.235
2021-06-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Topic models, such as LDA and its variants, are popular probabilistic models for discovering the abstract "topics" that occur in a collection of documents. However, the performance of topic models may vary a lot for different workloads, and it is not a trivial task to achieve a well-optimized implementation. In this paper, we systematically study all recently proposed samplers over LDA: AliasLDA, F+LDA, LightLDA, and WarpLDA, and discover a novel system tradeoff by considering the diversity and skewness of workloads. Then, we propose a hybrid sampler which can cleverly choose an efficient sampler with the tradeoff, and apply the hybrid sampler to LDA and its variants, including STM, TOT and CTM. Finally, we build a fast and general topic modeling system Sys-TM, which provides a unified topic modeling framework by integrating the hybrid sampler. Based on our empirical studies, the hybrid sampler outperforms the state-of-the-art samplers by up to $2times$ 2× over various topic models, and with carefully engineered implementation, Sys-TM is able to outperform the existing systems by up to $10times$ 10×.
computer science, information systems, artificial intelligence,engineering, electrical & electronic