Automatic Labelling Of Topic Models Using Word Vectors And Letter Trigram Vectors

Wanqiu Kou,Fang Li,Timothy Baldwin
DOI: https://doi.org/10.1007/978-3-319-28940-3_20
2015-01-01
Abstract:The native representation of LDA-style topics is a multinomial distributions over words, which can be time-consuming to interpret directly. As an alternative representation, automatic labelling has been shown to help readers interpret the topics more efficiently. We propose a novel framework for topic labelling using word vectors and letter trigram vectors. We generate labels automatically and propose automatic and human evaluations of our method. First, we use a chunk parser to generate candidate labels, then map topics and candidate labels to word vectors and letter trigram vectors in order to find which candidate label is more semantically related to that topic. A label can be found by calculating the similarity between a topic and its candidate label vectors. Experiments on three common datasets show that not only the labelling method, but also out approach to automatic evaluation is effective.
What problem does this paper attempt to address?