Optimizing Communications in Processing Data Integration Queries

Jia Liu,Yongwei Wu,Guangwen Yang
DOI: https://doi.org/10.1109/chinagrid.2008.7
2008-01-01
Abstract:Since query processing of data integration needs to access data from numerous wide-distributed sources over network, it is crucial to investigate how to deal with the expensive communication overhead. A staged data integration model is introduced for grid environment in this paper. It takes advantage of the abundant computer nodes to process integrated query over a number of highly-distributed and high-volume data sources. The content-based scheduling algorithm in the model groups the queries over the similar data sources together to enhances the opportunities of data sharing among concurrent queries for the same data source. Furthermore, an approach of multiple queries optimization is proposed to exploit data sharing, and avoid redundant data transfer without sacrificing the autonomy of data sources as well. Experimental results validate that our algorithms improve data integration performance in terms of both communication traffic and response time.
What problem does this paper attempt to address?