A FAST ALGORITHM FOR LARGE SCALE WEB PAGE CLASSIFICATION

Miao Youdong,Qiu Xipeng,Huang Xuanjing
DOI: https://doi.org/10.3969/j.issn.1000-386X.2012.07.075
2012-01-01
Abstract:There are such problems in web page classification as involving too many categories and too few training samples,so that normal classifiers perform poor in applications.To solve the problem,centroid-based classification method is presented.Centroid-based algorithm not only achieves very good classification performance with fewer manual annotation tags,but also significantly improves training speed and prediction speed by adding web page hierarchical category information.By comparing with other methods that participated in 1st LSHTC evaluation,experimental results show that centroid-based algorithm can get a very fast training and prediction speed with competitive accuracy.
What problem does this paper attempt to address?