The Design and Implementation of a High-Efficiency Distributed Web Crawler

Qiumei Pu
DOI: https://doi.org/10.1109/dasc-picom-datacom-cyberscitec.2016.34
2016-01-01
Abstract:With the rapid development of the Internet, the amount of data on the Internet become more and more huge, and the website technology is constantly changing. Faced with the huge and complex data on the global Internet, how to crawl and use this information has become a major challenge. Traditional stand-alone web crawler is difficult to cope with the challenges brought by the rapid growth of information, and it is difficult to grab huge amounts of data quickly and effectively. In this paper, we research to use the distributed technology to design and implement an efficient, configurable, load balancing and scalable distributed web crawler system.
What problem does this paper attempt to address?