Chinese Web Page Classification Based on Statistical Word Segmentation

黄科,马少平
DOI: https://doi.org/10.1109/icsmc.2002.1168050
2002-01-01
Abstract:Absfron.. Ward segmentation is am important step in Chinese natural lsogusge pmcesrlng. This paper explorer tho problem of clasrlfyhg Chinese web p g a hared an statistical word segmeotatioa We first conshvd a Chinese ward list of binary words autonutluUy from tr.ining Chinese web pages. Then the lexis in testing Chinese wch pages are segmented with the word list. Web pages are elsssMcd bared on the segmentstion results. Experiments show thst atrtisticd word segmentation e m eRiciently improve clnrriliution precisian. Based on the experiment results, we mdym the influence of rtntistieal ward segmentation on Chinese web page dassilieation. Single Chioae characters and w0rd.v plsy different roia in web page cl=rdfieaHon and the reason for the difference is .bo mdyzed.
What problem does this paper attempt to address?