A New Method to Query Document Database by Content and Structure

Wen Ji-Rong,Luan Jin-Feng,LUAN Jin-Feng,MA Wei-Ying,DONG Yi-Sheng
2002-01-01
Abstract:Structured documents are made up of a few logical components, such as title, sections, subsections and paragraphs. The components in each structured document can be represented by an ordered tree model, which can also be viewed as a hierarchical concept relationship. To meet the user's requirements for more precise and concentrated search results, the retrieval techniques should allow the user to retrieve document components with varying granularity. This paper presents a new method to query document database by content and structure. The key idea is to construct a more comprehensive similarity function by taking advantage of the inherent hierarchical structure in documents. This work combines Information Retrieval techniques, semi-structured data query and proximate search for query document documents. The proposed method is evaluated on the Encarta encyclopedia document set and the experimental results show that it can provide more accurate and focused answers than traditional document retrieval methods.
What problem does this paper attempt to address?