Guide Focused Crawler Efficiently and Effectively Using On-Line Topical Importance Estimation

Ziyu Guan,Can Wang,Chun Chen,Jiajun Bu,Junfeng Wang
DOI: https://doi.org/10.1145/1390334.1390488
2008-01-01
Abstract:Focused crawling is a critical technique for topical resource discovery on the Web. We propose a new frontier prioritizing algorithm, namely, the OTIE (On-line Topical Importance Estimation) algorithm, which efficiently and effectively combines link-based and content-based analysis to evaluate the priority of an uncrawled URL in the frontier. We then demonstrate OTIE's advantages over traditional prioritizing algorithms by real crawling experiments.
What problem does this paper attempt to address?