Chinese text filtering based on domain keywords extracted from Wikipedia

Xiang Wang,Hu Li,Yan Jia,SongChang Jin
DOI: https://doi.org/10.1007/978-3-642-34522-7_104
2013-01-01
Abstract:Several machine learning and information retrieval algorithms have been used for text filtering. All these methods have a common ground that they need positive and negative examples to build user profile. However, not all applications can get good training documents. In this paper, we present a Wikipedia based method to build user profile without any other training documents. The proposed method extracts keywords of a special category from Wikipedia taxonomy and computes the weights of the extracted keywords based on Wikipedia pages. Experiment results on Chinese news text dataset SogouC show that the proposed method achieves good performance. © 2013 Springer-Verlag.
What problem does this paper attempt to address?