DF or IDF? on the Use of HTML Primary Feature Fields for Web IR.

Min Zhang,Ruihua Song,Shaoping Ma
2003-01-01
Abstract:This paper describes a new document-frequency-related query term weighting schema in Web information retrieval using HTML structure information. Firstly, the concept of the Primary Feature Space has been proposed, which is composed of the more informative field in HTML documents, such as emphasized bold words. Secondly, a new PF query term weighting schema has been proposed which takes logarithm of DF into accounts instead of general IDF factor. Finally, a combination strategy of term weighting on both Primary Feature Field and general body text is given. The consistent great improvement of performance verifies the reliability and effectiveness of the PF term weighting schema.
What problem does this paper attempt to address?