Abstract:Due to the popularity of social networks, such as microblogs and Twitter, a vast amount of short text data is created every day. Much recent research in short text becomes increasingly significant, such as topic inference for short text. Biterm topic model (BTM) benefits from the word co-occurrence patterns of the corpus, which makes it perform better than conventional topic models in uncovering latent semantic relevance for short text. However, BTM resorts to Gibbs sampling to infer topics, which is very time consuming, especially for large-scale datasets or when the number of topics is extremely large. It requires 0(K) operations per sample for K topics, where K denotes the number of topics in the corpus. In this paper, we propose an acceleration algorithm of BTM, FastBTM, using an efficient sampling method for BTM, which converges much faster than BTM without degrading topic quality. FastBTM is based on Metropolis Hastings and alias method, both of which have been widely adopted in Latent Dirichlet Allocation (LDA) model and achieved outstanding speedup. Our FastBTM can effectively reduce the sampling complexity of biterm topic model from 0(K) to 0(1) amortized time. We carry out a number of experiments on three datasets including two short text datasets, Tweets2011 Collection dataset and Yahoo! Answers dataset, and one long document dataset, Enron dataset. Our experimental results show that when the number of topics K increases, the gap in running time speed between FastBTM and BTM gets especially larger. In addition, our FastBTM is effective for both short text datasets and long document datasets. (C) 2017 Elsevier B.V. All rights reserved.

Modeling over Short Texts

A biterm topic model for short texts

BTM: Topic Modeling over Short Texts

Short Text Topic Modeling With Flexible Word Patterns

SBTM: A Joint Sentiment and Behaviour Topic Model for Online Course Discussion Forums

Short Text Understanding by Leveraging Knowledge into Topic Model.

A Joint Model Of Extended Lda And Ibtm Over Streaming Chinese Short Texts

Bag of biterms modeling for short texts

Topic Discovery for Streaming Short Texts with CTM.

Biterm Pseudo Document Topic Model for Short Text

User Based Aggregation for Biterm Topic Model

Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts

FastBTM: Reducing the Sampling Time for Biterm Topic Model

Stochastic Divergence Minimization for Biterm Topic Model

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Topic Modeling over Short Texts by Incorporating Word Embeddings

GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model.

CS-BTM: a semantics-based hot topic detection method for social network

A Probabilistic Model For Bursty Topic Discovery In Microblogs

Short Text Topic Modeling Techniques, Applications, and Performance: A Survey

TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement