An Adaptive Focused Crawling Algorithm Based on Link and Content Analysis

Qing-sheng ZHU,Ning XU,Yu ZHOU
DOI: https://doi.org/10.3969/j.issn.1006-2475.2015.09.016
2015-01-01
Abstract:The focused crawling is a key technique of focus search engine. To solve the problem of incomplete parameters consid-ering in the On-line Topical Importance Estimation ( OTIE) algorithm, this paper proposes an adaptive algorithm that combines link with content analysis to estimate the priority of unvisited URL in the frontier. Moreover, we consider the tunneling problem in the process of topical crawling. We select topics and seed pages from the Open Directory Project ( ODP) and conduct the compar-ative experiments with four crawling algorithms:Best-First, Shark-Search, OTIE and our algorithm. The results of experiment in-dicate that the proposed method improves the performance of focused crawler that significantly outperforms the other three algo-rithms on the average target recall while maintaining an acceptable harvest rate.
What problem does this paper attempt to address?