Keyword Searches in Data-Centric XML Documents Using Tree Partitioning

Li Guoliang,Feng Jianhua,Zhou Lizhu
DOI: https://doi.org/10.1016/s1007-0214(09)70002-1
2009-01-01
Abstract:This paper presents an effective keyword search method for data-centric extensive markup language (XML) documents. The method divides an XML document into compact connected integral subtrees, called self-integral trees (SI-Trees), to capture the structural information in the XML document. The SI-Trees are generated based on a schema guide. Meaningful self-integral trees (MSI-Trees) are identified, which contain all or some of the input keywords for the keyword search in the XML documents. Indexing is used to accelerate the retrieval of MSI-Trees related to the input keywords. The MSI-Trees are ranked to identify the top-k results with the highest ranks. Extensive tests demonstrate that this method costs 10–100 ms to answer a keyword query, and outperforms existing approaches by 1–2 orders of magnitude.
What problem does this paper attempt to address?