Abstract:We propose selective supervised Latent Dirichlet Allocation (ssLDA) to boost the prediction performance of the widely studied supervised probabilistic topic models. We introduce a Bernoulli distribution for each word in one given document to select this word as a strongly or weakly discriminative one with respect to its assigned topic. The Bernoulli distribution is parameterized by the discrimination power of the word for its assigned topic. As a result, the document is represented as a "bag-of-selective-words" instead of the probabilistic "bag-of-topics" in the topic modeling domain or the flat "bag-of-words" in the traditional natural language processing domain to form a new perspective. Inheriting the general framework of supervised LDA (sLDA), ssLDA can also predict many types of response specified by a Gaussian Linear Model (GLM). Focusing on the utilization of this word selection mechanism for singe-label document classification in this paper, we conduct the variational inference for approximating the intractable posterior and derive a maximum-likelihood estimation of parameters in ssLDA. The experiments reported on textual documents show that ssLDA not only performs competitively over "state-of-the-art" classification approaches based on both the flat "bag-of-words" and probabilistic "bag-of-topics" representation in terms of classification performance, but also has the ability to discover the discrimination power of the words specified in the topics (compatible with our rational knowledge).

A Word Position-Related Lda Model

Topic Discovery Based on LDA_col Model and Topic Significance Re-ranking.

Probabilistic Word Selection Via Topic Modeling

Labeled Phrase Latent Dirichlet Allocation

Exploring Topic Discriminating Power of Words in Latent Dirichlet Allocation.

Latent dirichlet allocation

A Probabilistic Topic Model with Noise Reduction Ability

Contextual-LDA: A Context Coherent Latent Topic Model for Mining Large Corpora.

LDA Meets Word2Vec

Efficient Probabilistic Latent Semantic Analysis with Sparsity Control

A LDA Model Based Topic Detection Method

πLDA: document clustering with selective structural constraints.

Topic Models Incorporating Statistical Word Senses

Enhance Adaboost Algorithm By Integrating Lda Topic Model

DiscLDA: Discriminative learning for dimensionality reduction and classification

Using Word Sense As a Latent Variable in LDA Can Improve Topic Modeling.

A Weighted Topic Modeling Approach Based on Word Embedding

Topic Selection in Latent Dirichlet Allocation.

Topic Model With Constrainted Word Burstiness Intensities

Bag-of-Discriminative-Words (BoDW) Representation via Topic Modeling.

Topic-weak-correlated Latent Dirichlet Allocation