Exploiting Global Semantic Similarity Biterms for Short-Text Topic Discovery

Heng-yang Lu,Gao-jian Ge,Yun Li,Chong-jun Wang,Jun-yuan Xie
DOI: https://doi.org/10.1109/ictai.2018.00151
2018-01-01
Abstract:The demand for mining massive short-text data from the Internet has promoted researches on topic models. There exist many schemes trying to solve the sparsity problems brought by short texts, mainly based on data aggregation or model improvement. Among them, Biterm Topic Model changes the way of modeling topics, which is on document-level biterms and has shown creativity and effectiveness. However, this may ignore those semantically similar and rarely co-occurrent word pairs, which are denoted as global biterms in this paper. Inspired by the successful application of word embeddings in GPU-DMM, we exploit word embeddings to extract semantically similar word pairs from the whole corpus to help discover better topics. We call this model as GloSS, which takes advantages of both the approach to model topics and word embeddings. Experimental results on two open-source and real datasets are superior to state-of-the-art topic models for short texts.
What problem does this paper attempt to address?