Bctm:A Topic Modeling Method Based on External Information

Gang Liu,TaiYing Wan,JingQi Gao,Wary Buntine
DOI: https://doi.org/10.2139/ssrn.4112971
2022-01-01
SSRN Electronic Journal
Abstract:Topic models are often used as intermediate algorithms for text mining and semantic analysis in natural language processing, and have a wide range of functions. However, most of the existing improvements to the topic model use word embedding to improve the accuracy of text modeling, but ignore the external information in the text. Aiming at the above problems, this paper proposes a topic model BCTM (Bi-Concept Topic Model) using the word feature information and concept information. Based on the BTM topic model, BCTM introduces word feature information through word vector technology and concept information based on ConceptNet to optimize topic modeling. The construction method of Bi-Concept pair is proposed. Based on ConceptNet semantic network, and the content of text is enriched with concept information. A more accurate topic distribution is obtained through the improved topic model, at the same time, due to the rich feature information, the model is also superior to the baseline model in short text modeling. The experiments prove that the bilingual topic model proposed in this paper has a good performance in modeling accuracy.
What problem does this paper attempt to address?