Abstract:We address two challenges of probabilistic topic modelling in order to better estimate the probability of a word in a given context, i.e., P(word|context): (1) No Language Structure in Context: Probabilistic topic models ignore word order by summarizing a given context as a "bag-of-word" and consequently the semantics of words in the context is lost. The LSTM-LM learns a vector-space representation of each word by accounting for word order in local collocation patterns and models complex characteristics of language (e.g., syntax and semantics), while the TM simultaneously learns a latent representation from the entire document and discovers the underlying thematic structure. We unite two complementary paradigms of learning the meaning of word occurrences by combining a TM (e.g., DocNADE) and a LM in a unified probabilistic framework, named as ctx-DocNADE. (2) Limited Context and/or Smaller training corpus of documents: In settings with a small number of word occurrences (i.e., lack of context) in short text or data sparsity in a corpus of few documents, the application of TMs is challenging. We address this challenge by incorporating external knowledge into neural autoregressive topic models via a language modelling approach: we use word embeddings as input of a LSTM-LM with the aim to improve the word-topic mapping on a smaller and/or short-text corpus. The proposed DocNADE extension is named as ctx-DocNADEe. We present novel neural autoregressive topic model variants coupled with neural LMs and embeddings priors that consistently outperform state-of-the-art generative TMs in terms of generalization (perplexity), interpretability (topic coherence) and applicability (retrieval and classification) over 6 long-text and 8 short-text datasets from diverse domains.

A Bayesian Nonparametric Topic Model with Variational Auto-Encoders

Nonparametric Topic Modeling with Neural Inference

Variational Gaussian Topic Model with Invertible Neural Projections

A neural topic model with word vectors and entity vectors for short texts

Topic Modeling with Wasserstein Autoencoders

Incorporating Knowledge Graph Embeddings into Topic Modeling

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

vONTSS: vMF based semi-supervised neural topic modeling with optimal transport

A Discrete Variational Recurrent Topic Model without the Reparametrization Trick

Learning Multilingual Topics with Neural Variational Inference

GraphBTM: Graph Enhanced Autoencoded Variational Inference for Biterm Topic Model.

Probabilistic Topic Modelling with Transformer Representations

S2vNTM: Semi-supervised vMF Neural Topic Modeling

textTOvec: Deep Contextualized Neural Autoregressive Topic Models of Language with Distributed Compositional Prior

Neural Topic Modeling with Bidirectional Adversarial Training

SenGen: Sentence Generating Neural Variational Topic Model.

Deep Autoencoding Topic Model With Scalable Hybrid Bayesian Inference

Neural Topic Modeling with Deep Mutual Information Estimation

Integration of Neural Embeddings and Probabilistic Models in Topic Modeling

A Disentangled Adversarial Neural Topic Model for Separating Opinions from Plots in User Reviews

Topic-VQ-VAE: Leveraging Latent Codebooks for Flexible Topic-Guided Document Generation