Hierarchically Classifying Chinese Web Documents Without Dictionary Support And Segmentation Procedure

Sg Zhou,Y Fan,Jt Hu,F Yu,Yf Hu
DOI: https://doi.org/10.1007/3-540-45151-X_20
2000-01-01
Abstract:This paper reports a system that hierarchically classifies Chinese web documents without dictionary support and segmentation procedure. In our classifier, Web documents are represented by N-grams (N less than or equal to4) that are easy to be extracted. A boosting machine learning approach is applied to classifying Web Chinese documents that share a topic hierarchy. The open and modularized system architecture makes our classifier be extendible. Experimental results show that our system can effectively and efficiently classify Chinese Web documents.
What problem does this paper attempt to address?