Assessing Data Quality Within Available Context

Jingyu Han,Dawei Jiang,Zhiming Ding
DOI: https://doi.org/10.1142/9789814273497_0004
2009-01-01
Abstract:Data quality rating is an important issue to be considered in many scenarios such as data integration, cooperative information system(CIS). Now it is widely accepted that data quality can be measured from multiple dimensions such as accuracy,completeness etc. Most of the work focuses on how to qualitatively analyze the dimensions and the analysis will greatly depend on experts' knowledge. Seldom work is given on how to automatically quantify data quality dimensions. To solve this challenging problem,we propose a novel approach to automatically Quantify Dimensions within Context(QDC). Data quality can be gauged by discrepancy between data view and its entity's perfect representation. Since it is difficult to obtain the perfect representation of entity, we propose to approximate the perfect representation within its available context and quality dimensions can be quantified in this context scope. By naturally borrowing entropy concepts from information theory, the measurement is easily given for different types of data. In this way the two most import quality dimensions,that are accuracy and completeness, are properly quantified. Our QDC approach can not only give an objective score and ranking in a cooperative multi-source environment but also avoid human's laborious interaction. As an automatic quality rating Solution our approach is distinguished, especially for large scale datasets. Theory and experiment shows our approach performs well for quality rating.
What problem does this paper attempt to address?