Ehllda: A Supervised Hierarchical Topic Model

Xian-Ling Mao,Yixuan Xiao,Qiang Zhou,Jun Wang,Heyan Huang
DOI: https://doi.org/10.1007/978-3-319-25816-4_18
2015-01-01
Abstract:In this paper, we consider the problem of modeling hierarchical labeled data - such as Web pages and their placement in hierarchical directories. The state-of-the-art model, hierarchical Labeled LDA (hLLDA), assumes that each child of a non-leaf label has equal importance, and that a document in the corpus cannot locate in a non-leaf node. However, in most cases, these assumptions do not meet the actual situation. Thus, in this paper, we introduce a supervised hierarchical topic models: Extended Hierarchical Labeled Latent Dirichlet Allocation (EHLLDA), which aim to relax the assumptions of hLLDA by incorporating prior information of labels into hLLDA. The experimental results show that the perplexity performance of EHLLDA is always better than that of LLDA and hLLDA on all four datasets; and our proposed model is also superior to hLLDA in terms of p@n.
What problem does this paper attempt to address?