Quality-Based Data Source Selection for Web-Scale Deep Web Data Integration

Xue-Feng Xian,Peng-Peng Zhao,Wei Fang,Jie Xin,Zhi-Ming Cui
DOI: https://doi.org/10.1109/icmlc.2009.5212537
2009-01-01
Abstract:Deep Web has been an important resource on the web due to its rich and high quality information, leading to emerging a new application area in data mining and information retrieval and integrates. In web-scale Deep Web data integration tasks, where there may be hundreds or thousands of data sources providing data of relevance to a particular domain, It must be inefficient to integrate all available Deep Web sources. This paper proposes a data source selection approach based on the quality of Deep Web source. It is used for automatic finding the highest quality set of Deep Web sources related to a particular domain, which is a premise for effective Deep Web data integration. The quality of data sources are assessed by evaluating quality dimensions represent the characteristics of Deep Web source. Experiments running on real Deep Web sources collected from the Internet show that our provides an effective and scalable solution for selecting data sources for Deep Web data integration.
What problem does this paper attempt to address?