A Web Crawler System Design Based on Distributed Technology

Shaojun Zhong,Zhijuan Deng
DOI: https://doi.org/10.4304/jnw.6.12.1682-1689
2011-01-01
Journal of Networks
Abstract:A practical distributed web crawler architecture is designed. The distributed cooperative grasping algorithm is put forward to solve the problem of distributed Web Crawler grasping. Log structure and Hash structure are combined and a large-scale web store structure is devised, which can meet not only the need of a large amount of random accesses, but also the need of newly added pages. Experiment results have shown that the distributed Web Crawler's performance, scalability, and load balance are better.
What problem does this paper attempt to address?