Abstract:Topic models have played a pivotal role in analyzing large collections of complex data. Besides discovering latent semantics, supervised topic models (STMs) can make predictions on unseen test data. By marrying with advanced learning techniques, the predictive strengths of STMs have been dramatically enhanced, such as max-margin supervised topic models, state-of-the-art methods that integrate max-margin learning with topic models. Though powerful, max-margin STMs have a hard non-smooth learning problem. Existing algorithms rely on solving multiple latent SVM subproblems in an EM-type procedure, which can be too slow to be applicable to large-scale categorization tasks. In this paper, we present a highly scalable approach to building max-margin supervised topic models. Our approach builds on three key innovations: 1) a new formulation of Gibbs max-margin supervised topic models for both multi-class and multi-label classification; 2) a simple ``augment-and-collapse" Gibbs sampling algorithm without making restricting assumptions on the posterior distributions; 3) an efficient parallel implementation that can easily tackle data sets with hundreds of categories and millions of documents. Furthermore, our algorithm does not need to solve SVM subproblems. Though performing the two tasks of topic discovery and learning predictive models jointly, which significantly improves the classification performance, our methods have comparable scalability as the state-of-the-art parallel algorithms for the standard LDA topic models which perform the single task of topic discovery only. Finally, an open-source implementation is also provided at: http://www.ml-thu.net/~jun/medlda.

Sparse online topic models

Short Text Understanding by Leveraging Knowledge into Topic Model.

Efficient Probabilistic Latent Semantic Analysis with Sparsity Control

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

STMLRC: Sparse Topic Model with Low Rank Constraint

Group Sparse Topical Coding

Parsimonious Topic Models with Salient Word Discovery

Sparse Relational Topic Models for Document Networks.

Online Subset Topic Modeling For Interactive Documents Exploration

Conditional topical coding: an efficient topic model conditioned on rich features.

Topic Modeling in Semantic Space with Keywords.

Topic-weak-correlated Latent Dirichlet Allocation

Efficient Topic Modeling on Phrases via Sparsity

A Topic Model for Co-Occurring Normal Documents and Short Texts.

Sparseness-constrained Nonnegative Tensor Factorization for Detecting Topics at Different Time Scales

Efficient Methods for Incorporating Knowledge into Topic Models

Locally discriminative topic modeling

Topic model based on co-occurrence word networks for unbalanced short text datasets

On Modelling Non-Linear Topical Dependencies

Scalable Inference in Max-Margin Topic Models

Topic Discovery for Streaming Short Texts with CTM.