Abstract:Topic models have been evolving rapidly over the years, from conventional to recent neural models. However, existing topic models generally struggle with either effectiveness, efficiency, or stability, highly impeding their practical applications. In this paper, we propose FASTopic, a fast, adaptive, stable, and transferable topic model. FASTopic follows a new paradigm: Dual Semantic-relation Reconstruction (DSR). Instead of previous conventional, VAE-based, or clustering-based methods, DSR directly models the semantic relations among document embeddings from a pretrained Transformer and learnable topic and word embeddings. By reconstructing through these semantic relations, DSR discovers latent topics. This brings about a neat and efficient topic modeling framework. We further propose a novel Embedding Transport Plan (ETP) method. Rather than early straightforward approaches, ETP explicitly regularizes the semantic relations as optimal transport plans. This addresses the relation bias issue and thus leads to effective topic modeling. Extensive experiments on benchmark datasets demonstrate that our FASTopic shows superior effectiveness, efficiency, adaptivity, stability, and transferability, compared to state-of-the-art baselines across various scenarios.

What problem does this paper attempt to address?

The problem that this paper attempts to solve is the deficiencies of existing topic models in terms of efficiency, effectiveness or stability. Specifically: 1. **Efficiency problem**: Although the existing topic models based on variational auto - encoders (VAE) have good performance, they have high computational complexity and take a long time when processing large - scale data sets. For example, some models may take several hours to process a data set containing 10,000 documents. 2. **Effectiveness problem**: Although the clustering - based methods are efficient, they often generate repetitive topics, lack diversity, and have inaccurate topic distributions of documents. 3. **Stability problem**: The existing neural topic models are very sensitive to hyper - parameters, and their performance fluctuates greatly in different scenarios, especially when the data domain, vocabulary size and document length are different. To solve these problems, the paper proposes a new topic model - FASTopic. The main features of FASTopic are as follows: - **Fast**: Improve computational efficiency by simplifying the model structure. - **Adaptive**: Be able to maintain good performance in different scenarios. - **Stable**: Be insensitive to hyper - parameters and have more stable performance. - **Transferable**: Be able to be effectively applied to different data sets and tasks. FASTopic introduces a new paradigm - Dual Semantic - relation Reconstruction (DSR), and optimizes semantic relations through the Embedding Transport Plan (ETP) method, thus solving the above problems. Specifically: - **DSR paradigm**: Directly model the semantic relations among document embeddings, topic embeddings and word embeddings, and discover latent topics by reconstructing these relations. - **ETP method**: Model semantic relations as optimal transport plans, avoid relation bias problems, and generate more discriminative topics and more accurate document topic distributions. Through these innovations, the experimental results of FASTopic on multiple benchmark data sets show that it is superior to the existing state - of - the - art methods in terms of efficiency, effectiveness, adaptability, stability and transferability.

FASTopic: Pretrained Transformer is a Fast, Adaptive, Stable, and Transferable Topic Model

FASTopic: A Fast, Adaptive, Stable, and Transferable Topic Modeling Paradigm

TFEformer: Temporal Feature Enhanced Transformer for Multivariate Time Series Forecasting

TransVOS: Video Object Segmentation with Transformers

Probabilistic Topic Modelling with Transformer Representations

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

Fastformer: Additive Attention Can Be All You Need

FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting

TS-Fastformer: Fast Transformer for Time-Series Forecasting

Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding

Faster Depth-Adaptive Transformers

TAN-NTM: Topic Attention Networks for Neural Topic Modeling

Sys-TM: A Fast and General Topic Modeling System

vONTSS: vMF based semi-supervised neural topic modeling with optimal transport

A Study of Text Vectorization Method Combining Topic Model and Transfer Learning

$FastDoc$: Domain-Specific Fast Continual Pre-training Technique using Document-Level Metadata and Taxonomy

Sampling Foundational Transformer: A Theoretical Perspective

TopicFM+: Boosting Accuracy and Efficiency of Topic-Assisted Feature Matching

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Towards the TopMost: A Topic Modeling System Toolkit

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning