Joint Use of Multiple Learned Statistics for Improving Online Source Selection

Thomas Hernandez,Zaiqing Nie,Subbarao Kambhampati
2004-01-01
Abstract:The autonomous and decentralized nature of available online sources prevents most existing integration systems from supporting flexi- ble query processing that takes into account conflicting user objec- tives such as coverage, cost-related, or data-quality objectives. To achieve multi-objective query processing, a data integration system must be able to determine which sources are most relevant for a par- ticular query, given the desired objectives. To do so, it must gather and use source-specific statistics. In this paper we present an ap- proach which automatically gathers coverage and overlap statistics as well as response time statistics, and jointly uses these statistics to select relevant sources. We describe our approach and present experimental results done in the context of BibFinder that demon- strate the efficiency and effectiveness of our approach.
What problem does this paper attempt to address?