Data Source Selection for Large-Scale Deep Web Data Integration

Xuefeng Xian,Pengpeng Zhao,Wei Fang,Jie Xin,Zhiming Cui
DOI: https://doi.org/10.1109/wmwa.2009.25
2009-01-01
Abstract:Deep web has been an important resource on the web due to its rich and high quality information, leading to emerging a new application area in data mining and integrates. There may be hundreds or thousands of data sources providing data of relevance to a particular domain on the web, So a primary challenge to large-scale deep web data integration is to determine in what order to user integrate candidate data sources. In this paper, we develop a most-benefit approach (MBA) for ordering candidate data sources for user integration. At the core of this approach is a utility function that quantifies the utility of a given the state of integration system; thus, we devise a utility function for integration system based on query result number We show in practice how to efficiently apply MBA in concert with this utility function to order data sources. A detailed experimental evaluation on real datasets shows that the ordering of data sources produced by this MBA-based yields a integration system with a significantly higher utility than a wide range of other ordering strategies.
What problem does this paper attempt to address?