Research and Realization of Intelligent Focused Web Crawler

LI Wei,LIU Jian-yi,HE Hua-can,WANG Cong
DOI: https://doi.org/10.3969/j.issn.1001-3695.2006.02.052
2006-01-01
Abstract:An Intelligent Focused Web Crawler(IFWC) is investigated detailedly.Supported by comprehensive information theory,Concept-based VSM is applied to filter the crawled web pages,while extended metadata semantic relevance algorithm for predicting the relativity between URL and topic.The result of experiments has shown that IFWC has more accurate and wider coverage for web pages relevant to a predefined set of topics.
What problem does this paper attempt to address?