Design and Implementation of Distributed Web Crawler for Open Access Journal

Zhenxiong YANG,Zurui CAI,Guohua CHEN,Yong TANG,Long ZHANG
DOI: https://doi.org/10.3778/j.issn.1673-9418.1405051
2014-01-01
Abstract:Open access journal is a kind of deep online resources and disperses on the Internet, and it is difficult for the traditional search engines to index these online resources, so the user can not access directly the open access journal via search engines, resulting in a waste of these open resources. This paper proposes a novel focused Web crawler with distributed architecture to collect the open access journal resources scattering throughout the Internet. This architecture adopts the distributed master-slave design, which consists of a master control center and multiple distributed crawler nodes, and proposes an academic information extraction method based on user predefined rules from the open access journals. These distributed crawling nodes can be adjusted dynamically and use Chrome browser based plug-in mechanism to achieve scalability and deployment flexibility.
What problem does this paper attempt to address?