Topic Distillation Algorithm Based on Site Resource

GUO Lishan,DONG Shoubin,YUAN Hua
DOI: https://doi.org/10.3321/j.issn:1000-0054.2005.09.004
2005-01-01
Abstract:The traditional algorithms of topic distillation have some known problems, and they can't meet the requirements of the topic distillation task in the SEWM-2004 Chinese Web Search contest. Based on the analysis of the hyperlink-induced topic search HITS, this paper presents an improved algorithm named hyperlink analysis within CWT100G HAC, which connect content analysis with hyperlink analysis and focus on the site resource. The HAC algorithm groups the pages by site through the URL pattern matching. Then model the inner hyperlinks structure into a graph, together with page content analysis, to calculate the iterative Hub/Authority value inside every site. Afterwards, analyze the outer hyperlinks between sites, and calculate the more accurate Hub/Authority value. The two contrastive experiment results show that the HAC algorithm can find out the more suitable Hub sites for a given query.
What problem does this paper attempt to address?