Web Page Classification Based on Extracting Hierarchy from Web Site

DENG Jian-shuang,ZHENG Qi-lun,PENG Hong
2006-01-01
Journal of Computer Applications
Abstract:Web page classification was one of the hot study problems in the domain of Internet Search currently. Now there were the classifiers based on text and the hyperlinks. But all these methods of classification only used the information of the pages without the information that was provided from the whole web site. In the article, there was a new arithmetic that simplifies the topology structure of the Web site and extracted the connotative hierarchy of the classification to build the classified tree, through which we could achieve the multi-level classification. This method has been applied to the system of intelligent searching and mining of electronic business successfully.
What problem does this paper attempt to address?