Hierarchical Text Categorization with Probabilistic Topics

权小军,林洋港,罗奇鸣,陈恩红
2009-01-01
Journal of University of Science and Technology of China
Abstract:Probabilistic topic model is a statistical generative model for automatically extracting a set of topics from a collection of documents and then representing these documents as mixtures of topics.Topics obtained by this method pick out significant semantic information of documents,and they have broad applications in many fields.A novel approach was proposed for hierarchical text categorization based on the probabilistic topic model.The approach first extracted a set of topics based on Gibbs sampling,then computed the similarities between test documents and each class based on the topics.Results of experiments on 20 NewsGroups dataset show that this approach is able to produce superior classification performance when compared to support vector machines.
What problem does this paper attempt to address?