TopicGPT: A Prompt-based Topic Modeling Framework

Chau Minh Pham,Alexander Hoyle,Simeng Sun,Philip Resnik,Mohit Iyyer

2024-04-02

Abstract:Topic modeling is a well-established technique for exploring text corpora. Conventional topic models (e.g., LDA) represent topics as bags of words that often require "reading the tea leaves" to interpret; additionally, they offer users minimal control over the formatting and specificity of resulting topics. To tackle these issues, we introduce TopicGPT, a prompt-based framework that uses large language models (LLMs) to uncover latent topics in a text collection. TopicGPT produces topics that align better with human categorizations compared to competing methods: it achieves a harmonic mean purity of 0.74 against human-annotated Wikipedia topics compared to 0.64 for the strongest baseline. Its topics are also interpretable, dispensing with ambiguous bags of words in favor of topics with natural language labels and associated free-form descriptions. Moreover, the framework is highly adaptable, allowing users to specify constraints and modify topics without the need for model retraining. By streamlining access to high-quality and interpretable topics, TopicGPT represents a compelling, human-centered approach to topic modeling.

Computation and Language

What problem does this paper attempt to address?

The paper attempts to address several key issues in traditional topic models (such as LDA) in text topic mining: 1. **Poor interpretability**: The topics generated by traditional topic models are usually in the form of a bag of words, making them difficult to directly understand and interpret. 2. **Lack of user control**: Existing methods provide limited control for users over the specific format and details of the generated topics. 3. **Consistency and accuracy**: The proposed method aims to improve the consistency and accuracy between the topics and the true topics annotated by humans. To address these issues, the authors introduce the TopicGPT framework, which utilizes large language models to generate and assign context-related topics through prompts. This framework not only improves the quality of the topics but also enhances their interpretability and allows users to customize and modify the topics as needed without retraining the model. Experimental results show that compared to baseline methods such as LDA, SeededLDA, and BERTopic, TopicGPT demonstrates higher topic consistency and stability across multiple datasets.

TopicGPT: A Prompt-based Topic Modeling Framework

GPTopic: Dynamic and Interactive Topic Representations

Prompting Large Language Models for Topic Modeling

Eliciting Topic Hierarchies from Large Language Models

PromptMTopic: Unsupervised Multimodal Topic Modeling of Memes using Large Language Models

Topic Modeling Revisited: A Document Graph-based Neural Network Perspective

IntentGPT: Few-shot Intent Discovery with Large Language Models

Generative AI for automatic topic labelling

Latent Gaussian Models for Topic Modeling

Prompting Frameworks for Large Language Models: A Survey

FASTopic: A Fast, Adaptive, Stable, and Transferable Topic Modeling Paradigm

Efficient and Flexible Topic Modeling using Pretrained Embeddings and Bag of Sentences

ModelGPT: Unleashing LLM's Capabilities for Tailored Model Generation

Enhanced Short Text Modeling: Leveraging Large Language Models for Topic Refinement

Helping Language Models Learn More: Multi-dimensional Task Prompt for Few-shot Tuning

TopicTag: Automatic Annotation of NMF Topic Models Using Chain of Thought and Prompt Tuning with LLMs

Hierarchical Latent Semantic Mapping for Automated Topic Generation

Large Language Models Offer an Alternative to the Traditional Approach of Topic Modelling

Topic Modeling based on Keywords and Context

Topic Modelling: Going Beyond Token Outputs

Conceptualization Topic Modeling