Anchor Text and Its Context Based Web Information Retrieval

张敏,高剑峰,马少平
2004-01-01
Abstract:One of the most important differences between traditional information retrieval (IR) and web IR lies in the hyperlink structure in web pages. This motivates the so-called link-based retrieval techniques for web IR. The concept of anchor description document is introduced, and then several methods of using anchor text and its context for web IR are proposed. The methods are evaluated using TREC2001 collection which contains over 1.69 million web pages. Several conclusions are drawn: Firstly, anchor text can represent precisely the topic of web page, but is insufficient in describing the web page content. Secondly, comparing with traditional content-based IR technique, using anchor text on homepage finding task can get more than 96% improvement in terms of 11-point average precision, while it is not helpful on ad hoc task even with context information. Finally, combining anchor text-based and traditional content-based techniques, more than 16% improvement of performance can be obtained.
What problem does this paper attempt to address?