An Efficient Centroid Based Chinese Web Page Classifier

Liu Hui,Peng Ran,Ye Shaozhi,Li Xing
2003-01-01
Abstract:In this paper, we present an efficient centroid based Chinese web page classifier that has achieved satisfactory performance on real data and runs very fast in practical use. It not only is clear designed, but also has some creative features: Chinese word segmentation and noise filtering technology in preprocessing module; combined 2 χ Statistics feature selection method; adaptive factors to improve categorization performance. Another advantage of this system is its optimized implementation. Finally we show performance results of experiments on a corpus from Peking University of China, and some discussions.
What problem does this paper attempt to address?