A Topic-specific Intelligent Web Crawler System

Rong QIAN,Xinhua XU,Ying ZHENG,Bingru YANG
DOI: https://doi.org/10.3969/j.issn.1000-3428.2006.03.021
2006-01-01
Abstract:This paper introduces the topic-specific intelligent Web Crawler system and its crawling algorithm based on Web content and structure mining. The algorithm takes full advantage of the characteristics of the neural network and can simulate the network topology conveniently and parallel calculation. The paper introduces the reinforcement learning to judge the relativity between the crawled page and the topic. When calculating the correlation, without regarding to the whole content of the Web page, but to abstract the important tags of HTML makeup of the Web page, to analyze the content and structure of the page, thereby judge the relativity between the crawled page and the topic, improve the efficiency and accuracy of collected information enormously.
What problem does this paper attempt to address?