Abstract:Deriving a successful document representation is the critical challenge in many downstream tasks in NLP, especially when documents are very short. It is challenging to handle the sparsity and the noise problems confronting short texts. Some approaches employ latent topic models, based on global word co-occurrence, to obtain topic distribution as the representation. Others leverage word embeddings, which consider local conditional dependencies, to map a document as a summation vector of them. Unlike the existing works which explore the strategy of utilizing one to help the other, i.e., topic models for word embeddings or vice versa, we propose CME-DMM, a collaboratively modeling and embedding framework for capturing coherent latent topics from short texts. CME-DMM incorporates topic and word embeddings through the attention mechanism and implants them into the latent topic models, which significantly improve the quality of latent topics. Extensive experiments demonstrate that CME-DMM could perceive more coherent topics than other popular methods, resulting in a better performance in downstream NLP tasks such as classification. Besides the interpretable latent topics, the corresponding topic embeddings can describe the meanings of latent topics in the semantic space. The attention vectors, as a by-product of the learning process, can identify the keywords in noisy short texts.

Topic Modeling over Short Texts by Incorporating Word Embeddings

Short Text Understanding by Leveraging Knowledge into Topic Model.

Mining Coherent Topics in Documents Using Word Embeddings and Large-Scale Text Data

Incorporating Knowledge Graph Embeddings into Topic Modeling

Topic Modeling in Embedding Spaces

Short Text Topic Modeling With Flexible Word Patterns

Incorporating Biterm Correlation Knowledge into Topic Modeling for Short Texts

BTM: Topic Modeling over Short Texts

Keyword Assisted Embedded Topic Model

A Correlated Topic Model Using Word Embeddings

Modeling over Short Texts

Collaboratively Modeling and Embedding of Latent Topics for Short Texts

Topic Modeling Using Distributed Word Embeddings

Representing Mixtures of Word Embeddings with Mixtures of Topic Embeddings

A biterm topic model for short texts

Research on Improve Topic Representation over Short Text.

A Joint Model Of Extended Lda And Ibtm Over Streaming Chinese Short Texts

Targeted Aspects Oriented Topic Modeling for Short Texts

TSSE-DMM: Topic Modeling for Short Texts Based on Topic Subdivision and Semantic Enhancement

Mitigating Data Sparsity for Short Text Topic Modeling by Topic-Semantic Contrastive Learning

A CWTM Model of Topic Extraction for Short Text.