Abstract:With the emergence and development of deep generative models, such as the variational auto-encoders (VAEs), the research on topic modeling successfully extends to a new area: neural topic modeling, which aims to learn disentangled topics to understand the data better. However, the original VAE framework had been shown to be limited in disentanglement performance, bringing their inherent defects to a neural topic model (NTM). In this paper, we put forward that the optimization objectives of contrastive learning are consistent with two important goals (alignment and uniformity) of well-disentangled topic learning. Also, the optimization objectives of contrastive learning are consistent with two key evaluation measures for topic models, topic coherence and topic diversity. So, we come to the important conclusion that alignment and uniformity of disentangled topic learning can be quantified with topic coherence and topic diversity. Accordingly, we are inspired to propose the Contrastive Disentangled Neural Topic Model (CNTM). By representing both words and topics as low-dimensional vectors in the same embedding space, we apply contrastive learning to neural topic modeling to produce factorized and disentangled topics in an interpretable manner. We compare our proposed CNTM with strong baseline models on widely-used metrics. Our model achieves the best topic coherence scores under the most general evaluation setting (100% proportion topic selected) with 25.0%, 10.9%, 24.6%, and 51.3% improvements above the second-best models’ scores reported on four datasets of 20 Newsgroups, Web Snippets, Tag My News, and Reuters, respectively. Our method also gets the second-best topic diversity scores on the dataset of 20Newsgroups and Web Snippets. Our experimental results show that CNTM can effectively leverage the disentanglement ability from contrastive learning to solve the inherent defect of neural topic modeling and obtain better topic quality.

CoTE: A Flexible Method for Joint Learning of Topic and Embedding Models

Going Beyond Multi-Task Dense Prediction with Synergy Embedding Models

Incorporating Knowledge Graph Embeddings into Topic Modeling

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

LTSG: Latent Topical Skip-Gram for Mutually Learning Topic Model and Vector Representations

Improving Topic Disentanglement Via Contrastive Learning

A Neural Generative Model for Joint Learning Topics and Topic-Specific Word Embeddings

Topic Modeling over Short Texts by Incorporating Word Embeddings

Joint Learning of Character and Word Embeddings.

Co-Teaching for Unsupervised Domain Adaptation and Expansion

Integration of Neural Embeddings and Probabilistic Models in Topic Modeling

Topic Modeling as Multi-Objective Contrastive Optimization

Model Linkage Selection for Cooperative Learning

Cooperative Training of Descriptor and Generator Networks

Jointly Dynamic Topic Model for Recognition of Lead-lag Relationship in Two Text Corpora

Learning Context-Specific Word/Character Embeddings.

Learning word representation by jointly using neighbor and syntactic contexts

CCoE: A Compact LLM with Collaboration of Experts

I Know What You Do Not Know

TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models