Research on Web Page Automatic Categorization Based on Structural and Text Information

GU Min,GUO Qing,CAO Ye,ZHU Feng,GU Yanhui,ZHOU Junsheng,QU Weiguang
DOI: https://doi.org/10.3969/j.issn.0253-2778.2017.04.002
2017-01-01
Journal of University of Science and Technology of China
Abstract:Since web pages contain abundant information resources,a better extraction and management of the information can be achieved through web page categorization.Considering the complex structure and abundant text information,a method was proposed for web page categorization based on the structure and text.The method of combining joint features and atomic features was employed to classify the web pages.The experiment result shows that the proposed method is feasible to some extent and has a higher precision and recall rate than using text information only.
What problem does this paper attempt to address?