Quality Estimation of Deep Web Data Sources for Data Fusion
Ming Sun,Huitao Dou,Qingzhong Li,Zhongmin Yan
DOI: https://doi.org/10.1016/j.proeng.2012.01.313
2012-01-01
Abstract:In a variety of domains, the amount of Web information grows rapidly, and the types of data sources are proliferating. Moreover, different data sources often provide heterogeneous or conflicting data, so we need to resolve data conflicts and find truth by data fusion. Currently, there are several advanced techniques that consider accuracy of sources, freshness of sources and dependencies between sources to solve the conflicts, and these strategies achieved good results. To improve the data fusion, we propose a quality estimation model of Deep Web data sources (DSQ). According to the characteristics of data fusion, our estimation model selects three dimensions of factors-data quality, interface quality and service quality-as estimation criteria, and estimates the quality of data sources. Then, we improve the data fusion using the estimation results. Experiment shows that our model can accurately estimate the quality of Deep Web data sources, and significantly improve the data fusion.