Efficient Selection and Integration of Hidden Web Database.

Xuefeng Xian,Pengpeng Zhao,Yuanfeng Yang,Jie Xin,Zhiming Cui
DOI: https://doi.org/10.4304/jcp.5.4.500-507
2010-01-01
Journal of Computers
Abstract:An ever increasing amount of valuable information is stored in web databases, "hidden" behind search interfaces. A new application area emerge for information retrieval and integration. There may be hundreds or thousands of web databases providing data of relevance to a particular domain on the web. So a primary challenge to internet-scale hidden web database integration is to determine in which web databases to include in the integration system with the aim of making the system contain as much high-quality data as possible and the least degree of overlap. In this paper, we present an approach to iteratively select and integrate candidate web database. The core of this approach is a benefit function that evaluates how much benefit the web database brings to a given status of an integration system by integrating it. We devise a benefit function based on the volume and quality of those new data that added to integration system by integrating the web database. We show in practice how to efficiently apply our approach to select and integrate web database. Our experiments on real hidden web databases indicate that the selected and integrated result of web databases produced by our approach yields an integration system with a significant higher utilities than a wide range of other strategies.
What problem does this paper attempt to address?