Selective Recrawling for Object-Level Vertical Search.

Yaqian Zhou,Mengjing Jiang,Qi Zhang,Xuanjing Huang,Lide Wu
DOI: https://doi.org/10.1145/1772690.1772884
2010-01-01
Abstract:In this paper we propose a novel recrawling method based on navigation patterns called Selective Recrawling. The goal of selective recrawling is to automatically select page collections that have large coverage and little redundancy to a pre-defined vertical domain. It only requires several seed objects and can select a set of URL patterns to cover most objects. The selected set can be used to recrawl the web pages for quite a period of time and renewed periodically. Experiments on local event data show that our method can greatly reduce the downloading of web pages while keep the comparative object coverage.
What problem does this paper attempt to address?