Abstract:Nowadays, people use short text to portray their opinions on platforms of social media such as Twitter, Facebook, and YouTube, as well as on e-commerce websites such as Amazon and Flipkart to share their commercial purchasing experiences. Every day, billions of short texts are created worldwide in tweets, tags, keywords, search queries etc. However, this short text possesses inadequate contextual information, which can be ambiguous, sparse, noisy, remains a major challenge. State-of-the-art strategies of topic modeling such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis are not suitable as it contains a limited number of words in a single document. This work proposes a new model named G_SeaNMF (Gensim_SeaNMF) to improve the word-context semantic relationship by using local and global word embedding techniques. Word embeddings learned from a large corpus provide general semantic and syntactic information about words; it can guide topic modeling for short text collections as supporting information for sparse co-occurrence patterns. In the proposed model, SeaNMF (Semantics-assisted Non-negative Matrix Factorization) is incorporated with word2vec model of Gensim library to strengthen the word's semantic relationship. In this article, a short text topic modeling techniques based on DMM (Dirichlet Multinomial Mixture), self-aggregation and global word co-occurrence were explored. These are evaluated using different measures to gauge cluster coherence on real-world datasets such as Search Snippet, Biomedicine, Pascal Flickr, Tweet and TagMyNews. Empirical evaluation shows that a combination of local and global word embedding provides more appropriate words under each topic with improved outcomes.

Exploiting Global Semantic Similarity Biterms for Short-Text Topic Discovery

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Constructing Pseudo Documents With Semantic Similarity For Short Text Topic Discovery

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

Modeling over Short Texts

Short Text Topic Modeling With Flexible Word Patterns

Biterm Pseudo Document Topic Model for Short Text

A biterm topic model for short texts

CS-BTM: a semantics-based hot topic detection method for social network

TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

Heterogeneous Latent Topic Discovery for Semantic Text Mining

Short text topic modelling using local and global word-context semantic correlation

Semantic Visualization for Short Texts with Word Embeddings

Utilizing Recurrent Neural Network for Topic Discovery in Short Text Scenarios

Topic Modeling over Short Texts by Incorporating Word Embeddings

Topic Discovery for Streaming Short Texts with CTM.

A Joint Model Of Extended Lda And Ibtm Over Streaming Chinese Short Texts

STSG: A Short Text Semantic Graph Model for Similarity Computing Based on Dependency Parsing and Pre-trained Language Models

Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts

A CWTM Model of Topic Extraction for Short Text.

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings