Abstract:We propose selective supervised Latent Dirichlet Allocation (ssLDA) to boost the prediction performance of the widely studied supervised probabilistic topic models. We introduce a Bernoulli distribution for each word in one given document to select this word as a strongly or weakly discriminative one with respect to its assigned topic. The Bernoulli distribution is parameterized by the discrimination power of the word for its assigned topic. As a result, the document is represented as a "bag-of-selective-words" instead of the probabilistic "bag-of-topics" in the topic modeling domain or the flat "bag-of-words" in the traditional natural language processing domain to form a new perspective. Inheriting the general framework of supervised LDA (sLDA), ssLDA can also predict many types of response specified by a Gaussian Linear Model (GLM). Focusing on the utilization of this word selection mechanism for singe-label document classification in this paper, we conduct the variational inference for approximating the intractable posterior and derive a maximum-likelihood estimation of parameters in ssLDA. The experiments reported on textual documents show that ssLDA not only performs competitively over "state-of-the-art" classification approaches based on both the flat "bag-of-words" and probabilistic "bag-of-topics" representation in terms of classification performance, but also has the ability to discover the discrimination power of the words specified in the topics (compatible with our rational knowledge).

Incorporating Probabilistic Knowledge into Topic Models.

Incorporating Knowledge Graph Embeddings into Topic Modeling

Short Text Understanding by Leveraging Knowledge into Topic Model.

Document Clustering Based on Probabilistic Topic Model

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Concept over Time: the Combination of Probabilistic Topic Model with Wikipedia Knowledge.

Source-LDA: Enhancing probabilistic topic models using prior knowledge sources

Combining Thesaurus Knowledge and Probabilistic Topic Models

Efficient Methods for Incorporating Knowledge into Topic Models

Probabilistic Non-Negative Matrix Factorization and Its Robust Extensions for Topic Modeling.

Efficient Probabilistic Latent Semantic Analysis with Sparsity Control

Grounding Topic Models with Knowledge Bases.

Integration of Neural Embeddings and Probabilistic Models in Topic Modeling

Knowledge discovery through directed probabilistic topic models: a survey

Probabilistic Topic Modeling for Comparative Analysis of Document Collections

A Novel Topic Model for Documents by Incorporating Semantic Relations Between Words

Interactive Topic Modeling Based on Hierarchical Dirichlet Process

Probabilistic model for academic social network and its applications

A Bayesian Topic Model for Human-Evaluated Interpretability.

Probabilistic Word Selection Via Topic Modeling

A two-stage hybrid probabilistic topic model for refining image annotation