Web topology search based on multithread recursive model

Zhang Mingwu,Yang Bo,Zhu Shenglin,Zhang Wenzheng
2007-01-01
Abstract:Web is growing and evolving at a rapid pace. It can be modeled as a directed graph in which a node represents a Web page and an edge represents a hyperlink relationship. There are serval search engines used for searching the internet information, on which is main based content and text information. Furthermore, some website topology using interesting association rules to measure the interestingness between two sets of web pages in the Website. In this paper, it describes our ongoing work on webdigger, a scalable web topology searcher to describe nodes relation between network nodes based on multithread recursive model, by which to analyse the nodes relation and improve topology find efficiency. Webdigger discover sites structure and map view by a recursive algorithm. Not only does it find out the web sites' link relatio, but also it analyses and processes the cross-link and loop-link. In our experiment, it gives web nodes relation that describes the self-linked, cross-linked and outer-linked in the large scale internet web environment. Experiment results show that average website ratio by others linked is 18.4%, self-linked is 47.4%, and 8.7% hyperlink is miss-linked.
What problem does this paper attempt to address?