A Semantic Embedding Enhanced Topic Model for User-Generated Textual Content Modeling in Social Ecosystems

Peng Zhang,Baoxi Liu,Tun Lu,Hansu Gu,Xianghua Ding,Ning Gu
DOI: https://doi.org/10.1093/comjnl/bxac091
2022-01-01
The Computer Journal
Abstract:The development of Information and Communication Technologies (ICT) and Web 2.0 promotes the emergence of diverse social ecosystems like social Internet of Things (IoT), social media and online communities. User-generated textual content (UGTC), which consists of unstructured texts, is the most important and common type of user-generated content in social ecosystems. UGTC in social ecosystems is generated according to two types of context information-global context (topics) and local context (semantic regularities). For UGTC modeling, topic models just consider global context but ignore semantic regularities, while semantic embedding models are on the opposite. So only utilizing topic models or semantic embedding models to model UGTC suffers from some drawbacks. For this problem, we propose a semantic embedding enhanced topic model named SEE-Twitter-LDA for accurately modeling UGTC in social ecosystems. The core of SEE-Twitter-LDA is that words are generated according to mutual semantic information of topics and semantic regularities. So global context and local context are jointly considered for UGTC modeling. By utilizing 553 098 tweets sampled from Twitter and 211 233 posts sampled from Weibo, we validate SEE-Twitter-LDA's better performance on perplexity, topic divergence and topic coherence versus existing related models.
What problem does this paper attempt to address?