Level-Biased Statistics in the Hierarchical Structure of the Web

Guang Feng,Tie-Yan Liu,Xu-Dong Zhang,Wei-Ying Ma
DOI: https://doi.org/10.1007/11731139_37
2006-01-01
Abstract:In the literature of web search and mining, researchers used to consider the World Wide Web as a flat network, in which each page as well as each hyperlink is treated identically. However, it is the common knowledge that the Web is organized with a natural hierarchical structure according to the URLs of pages. Exploring the hierarchical structure, we found several level-biased characteristics of the Web. First, the distribution of pages over levels has a spindle shape. Second, the average indegree in each level decreases sharply when the level goes down. Third, although the indegree distributions in deeper levels obey the same power law with the global indegree distribution, the top levels show a quite different statistical characteristic. We believe that these new discoveries might be essential to the Web, and by taking use of them, the current web search and mining technologies could be improved and thus better services to the web users could be provided.
What problem does this paper attempt to address?