New algorithm of topic-oriented crawler

LI Wei-jiang,ZHAO Tie-jun,PIAO Xing-hai
2009-01-01
Abstract:The general crawler provides more help to people for finding information in WWW.However,it has some drawback in terms of precision and efficiency because of its generality and no specialty.This paper addressed two issues of the topic-oriented Web crawler.One is how to make the definition of the topic,the other is how to sort of links to be downloaded in the queue efficiently.It aimed to visit only relevant pages,and got a great scale of hyperlinks which link to the relevant pages.The crawl method is a novel one,which was based on the semi-structured features of the website and content information.The results of experiment show that it is a very effective method for focused crawler.
What problem does this paper attempt to address?