Abstract:Most of the researches on web information processing are concentrated on the web pages and the hyperlinks among them. One of the important facts that a web page is just one building block of the whole website had been ignored. But the situation is gradually changed in recent years for the needs of website reputation calculation, the high level website structure mining etc. It causes the website ranking become one of the hot research topics and various site ranking algorithms, such as SiteRank, AggregateRank etc., had been proposed. But most of existing website ranking algorithm just take use of website link graphs and the content of websites are usually not put into consideration. It is obviously not enough for a reliable ranking of websites. To address this issue, this paper introduces two content based features, i.e., semantic relevance and time frequency and proposes a new STRank algorithm based on these two features. We firstly conduct a series of experiments to verify the feasibility of these two factors in site ranking task. Then the semantic relevance is applied in the calculation of transition probability, and the updating frequency of sites is combined into the ranking task. Since traditional Kendall's tau distance and Spearman's Footrule distance is not appropriate for the evaluation of site ranking, we make some modifications accordingly to evaluate website ranking algorithms. Finally, our experiments show that the STRank algorithm outperforms existing approaches on both effectiveness and efficiency.

A Study on Combination of Block Importance and Relevance to Estimate Page Relevance

Learning Block Importance Models for Web Pages

Learning Important Models for Web Page Blocks Based on Layout and Content Analysis

Relevance Estimation with Multiple Information Sources on Search Engine Result Pages.

Block-based Web Search

Improving Web Search Ranking by Incorporating Summarization

A New Approach to Query Segmentation for Relevance Ranking in Web Search

Search Result Reranking with Visual and Structure Information Sources

Investigating Passage-level Relevance and Its Role in Document-level Relevance Judgment

Exploiting PageRank at Different Block Level

When does Relevance Mean Usefulness and User Satisfaction in Web Search?

Understanding User Situational Relevance in Ranking Web Search Results

Visual Block Link Analysis for Image Re-Ranking

Improve Ranking by Using Image Information

Block-based language modeling approach towards web search

Improving Pseudo-Relevance Feedback in Web Information Retrieval Using Web Page Segmentation.

Using Anchor Text Refined by Page Importance to Improve Web Retrieval

A Document Relevance Based Search Result Re-Ranking

STRank: A SiteRank Algorithm Using Semantic Relevance and Time Frequency

Query Segmentation for Relevance Ranking in Web Search

Algorithm for Webpage Semantic Blocks Mining Using Tree Match Method