Research on Feature Extraction for Content-based Chinese Web Pages Analysis

Yizhong Zhang,Mingsheng Zhao,Jingnan Zhu
DOI: https://doi.org/10.3321/j.issn:1002-8331.2001.10.001
2001-01-01
Abstract:This paper presents a feature framework for content-based Chinese web page analysis and searching. The method for constructing segmentation keyword dictionary is introduced first. The keywords in the dictionary are these words that represent the contents and concepts of a certain are web pages. Then,feature extraction methods for text,tag information and hyperlink information are addressed. Experiments have shown that the proposed methods tested on Chinese travel web pages are worked very well.
What problem does this paper attempt to address?