Ontology-based focused crawler

Gechao Lu,Wanli Zuo,Aiqi Zhang,Ying Wang,Wenyan Ji
2010-01-01
Journal of Information and Computational Science
Abstract:This paper studies how to make the Focused Crawler collect the topical pages effectively and accurately. We analyze the inadequacy of the traditional methods and present a model used for the extraction of the feature vector. We present another model base on ontology to calculate the similarity between pages in semantic. Then we build a Focused Crawler that synthesizes the two models mentioned above and the Best-First in [1] strategy. We use the URL distribution strategy in [2] in the Focused Crawler. From the experiment's results we found that the two models are effective and accurate in feature vector extraction and similarity calculation. Therefore, the ontology-based focused crawler we present here is feasible in Focused Crawling. Copyright © 2010 Binary Information Press.
What problem does this paper attempt to address?