A Weighted Topic Modeling Approach Based on Word Embedding

Conghui Yuan,Shengnan Zhang
DOI: https://doi.org/10.1109/AICIT59054.2023.10277734
2023-09-15
Abstract:Topic modeling is commonly used to discover potential semantic structures in different domain corpora, and it is an essential tool for semantic retrieval, feature extraction, and target classification of large amounts of text. As a widely used probabilistic topic generation model, Latent Dirichlet Allocation (LDA) is powerful in extracting salient topics, but LDA makes assumptions based on bag-of-words model, which ignores the semantic information implied between words, makes inter-word disorder in the process of topic extraction, and results in poor quality of generated topics. In this paper, we propose a weighted topic modeling method based on word embedding, which first constructs the directed semantic graph between feature words, calculates weight values of nodes, and then improves LDA topic model by combining word embedding. The experimental results show that the presented method can effectively enhance the contextual semantic association and improve the distinction between topics compared with TF-LDA and ETM.
Computer Science
What problem does this paper attempt to address?