New Automatic Categorization Algorithm for Chinese Homepages

ZHANG Li,LI Xing,LU Dajin
DOI: https://doi.org/10.3321/j.issn:1000-0054.2000.01.012
2000-01-01
Abstract:Current abundant resources can be accessed on the Internet, but there is no effective method to organize the information. Through analysising of the characteristics of Chinese text and Chinese homepages, a new automatic categorization method for Chinese homepages was presented. This method correlates the Chinese characters, the term frequency, and the hypertext markup language (HTML) tag information in the homepage to calculate an adjustable term frequency weighting parameter. An expert database is built using both in set and out set sample training. Experiments show that the method's recognition rate is about 80%.
What problem does this paper attempt to address?