wHunter: a focused web crawler – a tool for digital library

Yun Huang,YunMing Ye
DOI: https://doi.org/10.1007/978-3-540-30544-6_59
2004-01-01
Abstract:Topic-driven Web Crawler or focused crawler is the key tool of on-line web information library. It's a challenging issue that how to achieve good performance efficiently with limited time and space resources. This paper proposes a focused web crawler wHunter that implements incremental and multi-strategy learning by taking the advantages of both SVM (support vector machines) and naïve Bayes. On the one hand, the initial performance is guaranteed via SVM classifier; on the other hand, when enough web pages are obtained, the classifier is switched to naïve Bayes so that on-line incremental learning is achieved. Experimental results show that our proposed algorithm is efficient and easy to implement.
What problem does this paper attempt to address?