Study on Parallel Crawler Based on Pipeline Load Balancing Model

Xiang-qian MENG,Yun-ming YE,Bin DENG
DOI: https://doi.org/10.3969/j.issn.1000-3428.2009.02.012
2009-01-01
Abstract:【Abstract】This paper proposes a load balancing model named Pipeline Load Balancing(PLB), to address the load balancing problem among concurrent modules in a parallel crawling system. Different tasks in PLB are implemented as independent modules which have similar processing abilities. Dynamic multi-threading and buffering mechanisms are employed to implement a PLB-based parallel crawler. The number of threads is adjusted according to the changing in buffer size and waiting interval of a thread. Experimental results show that the PLB-based crawler provides high performance as well as good stability. 【Key words】crawler; parallel; pipeline; load balancing
What problem does this paper attempt to address?