The Automatic Classification Of Web Pages Based On Neural Network

Yizhong Zhang,Mingsheng Zhao,Youshou Wu
2001-01-01
Abstract:The web pages classification is certainly important. A technique of extracting field information as common knowledge may be also needed. Compound word processing in keyword extraction from web pages is also one of important factors. In this method, the tour fields are systematically defined at first and the information related to the field is extracted. A new method of extracting feature was considered, which can incorporate three items of information: text, HTML tags and hyperlinks properly.Accordingly, this paper presents a neural network algorithm (Self-organizing feature map) to study on automatic classification of web pages. The proposed approach is based on a new set of features combined with a self-organized neural network classifier. The set of features corresponds to the contents, is selected by using a statistical reduction procedure, and provides text keywords, hyperlink and HTML tags information. The final set of features is then utilized as input vector into a proper neural network to achieve the classification goal. Web pages are classified as different classes.A series of experiments were conducted to evaluate performance of our approach. The results have shown it is quite promising.
What problem does this paper attempt to address?