Abstract:In recent years, short texts have become a kind of prevalent text on the internet. Due to the short length of each text, conventional topic models for short texts suffer from the sparsity of word co-occurrence information. Researchers have proposed different kinds of customized topic models for short texts by providing additional word co-occurrence information. However, these models cannot incorporate sufficient semantic word co-occurrence information and may bring additional noisy information. To address these issues, we propose a self-aggregated topic model incorporating document embeddings. Aggregating short texts into long documents according to document embeddings can provide sufficient word co-occurrence information and avoid incorporating non-semantic word co-occurrence information. However, document embeddings of short texts contain a lot of noisy information resulting from the sparsity of word co-occurrence information. So we discard noisy information by changing the document embeddings into global and local semantic information. The global semantic information is the similarity probability distribution on the entire dataset and the local semantic information is the distances of similar short texts. Then we adopt a nested Chinese restaurant process to incorporate these two kinds of information. Finally, we compare our model to several state-of-the-art models on four real-world short texts corpus. The experiment results show that our model achieves better performances in terms of topic coherence and classification accuracy.

Utilizing Recurrent Neural Network for Topic Discovery in Short Text Scenarios

Don't Forget the Quantifiable Relationship between Words: Using Recurrent Neural Network for Short Text Topic Discovery.

Short Text Understanding by Leveraging Knowledge into Topic Model.

Incorporating Knowledge into Neural Network for Text Representation.

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Context Reinforced Neural Topic Modeling over Short Texts

A Joint Model Of Extended Lda And Ibtm Over Streaming Chinese Short Texts

BTM: Topic Modeling over Short Texts

Using Topic Modeling Methods for Short-Text Data: A Comparative Analysis

Constructing Pseudo Documents With Semantic Similarity For Short Text Topic Discovery

A Nested Chinese Restaurant Topic Model for Short Texts with Document Embeddings

Modeling over Short Texts

A biterm topic model for short texts

Intensity of Relationship Between Words: Using Word Triangles in Topic Discovery for Short Texts

Exploiting Global Semantic Similarity Biterms for Short-Text Topic Discovery

Short Text Topic Modeling with Topic Distribution Quantization and Negative Sampling Decoder.

Let the Pretrained Language Models "Imagine" for Short Texts Topic Modeling

Topic Modeling over Short Texts by Incorporating Word Embeddings

Short Text Topic Modeling Techniques, Applications, and Performance: A Survey

Bi-Directional Recurrent Attentional Topic Model

Topic model based on co-occurrence word networks for unbalanced short text datasets