Copula Guided Parallel Gibbs Sampling for Nonparametric and Coherent Topic Discovery (extended Abstract)

Lihui Lin,Yanghui Rao,Haoran Xie,Raymond Y.K. Lau,Jian Yin,Fu Lee Wang,Qing Li
DOI: https://doi.org/10.1109/icde55515.2023.00338
2023-01-01
Abstract:In terms of the generative process, the Gamma-Gamma-Poisson Process (G2PP) is equivalent to the nonparametric topic model of Hierarchical Dirichlet Process (HDP). Considering the high computational cost of estimating parameters in HDP, a parallel G2PP was developed to generate topics efficiently via multi-threading. Unfortunately, the above model needs to predefine the number of topics. To address this issue, we first propose a Topic Self-Adaptive Model (TSAM) for nonparametric and parallel topic discovery. In TSAM, a monitor-executor mechanism is developed to manage the global topic information using a hierarchical structure of threads. Based on the apparatus of copulas, we further extend our TSAM to TSAMcop for coherent topic modeling by exploiting a copula guided parallel Gibbs sampling algorithm. Extensive experiments validate the effectiveness of both TSAM and TSAMcop.
What problem does this paper attempt to address?