Abstract:Nowadays, people use short text to portray their opinions on platforms of social media such as Twitter, Facebook, and YouTube, as well as on e-commerce websites such as Amazon and Flipkart to share their commercial purchasing experiences. Every day, billions of short texts are created worldwide in tweets, tags, keywords, search queries etc. However, this short text possesses inadequate contextual information, which can be ambiguous, sparse, noisy, remains a major challenge. State-of-the-art strategies of topic modeling such as Latent Dirichlet Allocation and Probabilistic Latent Semantic Analysis are not suitable as it contains a limited number of words in a single document. This work proposes a new model named G_SeaNMF (Gensim_SeaNMF) to improve the word-context semantic relationship by using local and global word embedding techniques. Word embeddings learned from a large corpus provide general semantic and syntactic information about words; it can guide topic modeling for short text collections as supporting information for sparse co-occurrence patterns. In the proposed model, SeaNMF (Semantics-assisted Non-negative Matrix Factorization) is incorporated with word2vec model of Gensim library to strengthen the word's semantic relationship. In this article, a short text topic modeling techniques based on DMM (Dirichlet Multinomial Mixture), self-aggregation and global word co-occurrence were explored. These are evaluated using different measures to gauge cluster coherence on real-world datasets such as Search Snippet, Biomedicine, Pascal Flickr, Tweet and TagMyNews. Empirical evaluation shows that a combination of local and global word embedding provides more appropriate words under each topic with improved outcomes.

Discovering Coherent Topics from Urdu Text.

Short Text Understanding by Leveraging Knowledge into Topic Model.

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Hierarchical Topic Mining via Joint Spherical Tree and Text Embedding

Topic Modeling Using Distributed Word Embeddings

Co-occurrences using Fasttext embeddings for word similarity tasks in Urdu

UTSA: Urdu Text Sentiment Analysis Using Deep Learning Methods

Topic Modeling over Short Texts by Incorporating Word Embeddings

Contextually Enriched Meta-Learning Ensemble Model for Urdu Sentiment Analysis

A self-supervised seed-driven approach to topic modelling and clustering

Seed-Guided Topic Discovery with Out-of-Vocabulary Seeds

Effective Seed-Guided Topic Discovery by Integrating Multiple Types of Contexts

Short text topic modelling using local and global word-context semantic correlation

Investigating topic modeling techniques through evaluation of topics discovered in short texts data across diverse domains

Topic model based on co-occurrence word networks for unbalanced short text datasets

TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

Topic Modeling on Online News.Portal Using Latent Dirichlet Allocation (LDA)

A machine learning approach for Urdu text sentiment analysis

A Process for Topic Modelling Via Word Embeddings

Topic2Vec: Learning distributed representations of topics

A Human Word Association based model for topic detection in social networks