Web Page Quality Estimation Based on Linear Discriminant Function1

Rongwei Cen,Yiqun Liu,Min Zhang,Liyun Ru,Shaoping Ma
2007-01-01
Abstract:With the growth of web data, how to estimate web page quality effectively and rapidly becomes more and more important for web information retrieval and knowledge discovery. This paper analyzes the differences between retrieval target pages and ordinary pages using query-independent features. Using these features, an algorithm called Linear Page Estimation (LPE) is proposed for web page quality estimation. Based on experiments on .GOV corpus and SOGOU corpus involving 26 million pages, about 95% pages can be reduced with more than 90% retrieval target pages retained using our algorithm. Experimental results based on TREC datasets also show that retrieval performance on collections selected by our algorithm can be close to or even better than that on the whole collection.
What problem does this paper attempt to address?