Keyword Search Based XML Data Source Selection

ZHU Guan-sheng,HUANG Hao,YANG Wei-dong
DOI: https://doi.org/10.3969/j.issn.1000-1220.2012.06.006
2012-01-01
Abstract:With the rapid growth of data in the Internet,data are distributed to several data sources rather than only in one single datasource.The users'keyw ord query w ill be delivered to each data source to process to get the result.In order to accelerate the query evaluation process,the key problem is how to select the relevant data sources to the keyw ord query.In this paper,w e proposed a keyword search based xml data source selection method.To make it easier to predict the relevance of the data source to the query,w epropose to use XDS(xml data source summary)to summarize the relationship betw een keyw ords and the data source.The nodes inXML documents are organized hierarchically,and w e capture this feature as w ell as the textual information of xml documents and integrate them to an evaluation formula,w hich is defined recursively to construct the XDS,and XDS w ill store some numerical infor-mation standing for the relevance betw een keyw ord pairs and the data source.Along w ith XDS,updating algorithms and compactingstrategy w ith some threshold are also provided to improve the runtime process performance,based upon XDS,w e propose four selection methods to select the most relevant top-k data sources w ith respect to users'keyw ord queries,and w e evaluate these selectionmethods as w ell as K-Graph method w ith DBLP dataset,and compare them w ith each other,the results show the best performance ofour proposed methods is both efficient and effective.
What problem does this paper attempt to address?