Abstract:Nowadays, people use short text to portray their opinions on platforms of social media such as Twitter, Facebook, and YouTube, as well as on e-commerce websites such as Amazon and Flipkart to share their commercial purchasing experiences. Every day, billions of short texts are created worldwide in tweets, tags, keywords, search queries etc. However, this short text possesses inadequate contextual information, which can be ambiguous, sparse, noisy, remains a major challenge. State-of-the-art strategies of topic modeling such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis are not suitable as it contains a limited number of words in a single document. This work proposes a new model named G_SeaNMF (Gensim_SeaNMF) to improve the word-context semantic relationship by using local and global word embedding techniques. Word embeddings learned from a large corpus provide general semantic and syntactic information about words; it can guide topic modeling for short text collections as supporting information for sparse co-occurrence patterns. In the proposed model, SeaNMF (Semantics-assisted Non-negative Matrix Factorization) is incorporated with word2vec model of Gensim library to strengthen the word's semantic relationship. In this article, a short text topic modeling techniques based on DMM (Dirichlet Multinomial Mixture), self-aggregation and global word co-occurrence were explored. These are evaluated using different measures to gauge cluster coherence on real-world datasets such as Search Snippet, Biomedicine, Pascal Flickr, Tweet and TagMyNews. Empirical evaluation shows that a combination of local and global word embedding provides more appropriate words under each topic with improved outcomes.

Constructing Pseudo Documents With Semantic Similarity For Short Text Topic Discovery

Short Text Understanding by Leveraging Knowledge into Topic Model.

Exploiting Global Semantic Similarity Biterms for Short-Text Topic Discovery

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Biterm Pseudo Document Topic Model for Short Text

Utilizing Recurrent Neural Network for Topic Discovery in Short Text Scenarios

TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

Topic Modeling over Short Texts by Incorporating Word Embeddings

A Nested Chinese Restaurant Topic Model for Short Texts with Document Embeddings

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

Semantic Visualization for Short Texts with Word Embeddings

Research on Improve Topic Representation over Short Text.

A biterm topic model for short texts

Modeling over Short Texts

Short Text Topic Modeling With Flexible Word Patterns

Don't Forget the Quantifiable Relationship between Words: Using Recurrent Neural Network for Short Text Topic Discovery.

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts

Semantic Component Analysis: Discovering Patterns in Short Texts Beyond Topics

A Semantic Embedding Enhanced Topic Model for User-Generated Textual Content Modeling in Social Ecosystems

Short text topic modelling using local and global word-context semantic correlation