Integration of Neural Embeddings and Probabilistic Models in Topic Modeling

Pantea Koochemeshkian,Nizar Bouguila
DOI: https://doi.org/10.1080/08839514.2024.2403904
IF: 2.777
2024-10-05
Applied Artificial Intelligence
Abstract:Topic modeling, a way to find topics in large volumes of text, has grown with the help of deep learning. This paper presents two novel approaches to topic modeling by integrating embeddings derived from Bert-Topic with the multi-grain clustering topic model (MGCTM). Recognizing the inherent hierarchical and multi-scale nature of topics in corpora, our methods utilize MGCTM to capture topic structures at multiple levels of granularity. We enhance the expressiveness of MGCTM by introducing the Generalized Dirichlet and Beta-Liouville distributions as priors, which provide greater flexibility in modeling topic proportions and capturing richer topic relationships. Comprehensive experiments on various datasets showcase the effectiveness of our proposed models in achieving superior topic coherence and granularity compared to state-of-the-art methods. Our findings underscore the potential of leveraging hybrid architectures, marrying neural embeddings with advanced probabilistic modeling, to push the boundaries of topic modeling.
computer science, artificial intelligence,engineering, electrical & electronic
What problem does this paper attempt to address?