Semi-supervised Labeled Hierarchical Dirichlet Process Topic Model for Document Categorization

Yongzhong LI,Tao ZHENG
DOI: https://doi.org/10.16451/j.cnki.issn1003-6059.201712010
2018-01-01
Abstract:The optimal structure of theme set can be automatically learned from the data with Hierarchical Dirichlet Process( HDP) topic model. However, the set of topics can not meet the semantic requirement. And in some theme models with labels it is difficult to set the parameters. Therefore, based on the known semantic labels and the certitude degree of labels, a semi-supervised labeled HDP topic model( SLHDP) and the accuracy evaluation index of random cluster are proposed in this paper. Higher weight is given by the known semantic labels. Combined with the property of the finite space being divided infinitely in Dirichlet process, the model is built via Chinese restaurant process. The experimental results on several Chinese and English datasets show that SLHDP model makes the topic set more reasonable in the text classification of large scale datasets.
What problem does this paper attempt to address?