DESIGN AND IMPLEMENTATION OF A NOVEL DISTRIBUTED WEB CRAWLER

Wu Libing,Ke Yalin,He Yanxiang,Liu Nan
DOI: https://doi.org/10.3969/j.issn.1000-386X.2011.11.045
2011-01-01
Abstract:In this article,a novel distributed web crawler DSpider is presented.DSpider can be deployed in single network domain and among multiple network domains,by adjusting its number of nodes and the threshold of connection timeout,it can also be effectively deployed in two network environments of both LAN and WAN.In the article,firstly the system architecture of DSpider is introduced briefly.Then the task scheduling strategy of DSpider is elaborately analysed.The article also gives a report of the experiment in which different performances of the DSpider disposing in two environments of LAN and WAN are analysed in detail.
What problem does this paper attempt to address?