TODQA: Efficient Task-Oriented Data Quality Assessment
Xiang Xiao,Anran Li,Yunting Xie,Xiangyang Li,Lan Zhang,Jianwei Qian
DOI: https://doi.org/10.1109/MSN48538.2019.00028
2019-12-01
Abstract:Data quality assessment is vital for many information services ranging from sensor networks to smart city systems. The current data quality assessments, however, are often derived from intrinsic data characteristics, disconnected from specific application contexts, or are not applicable or efficient for large datasets. In this work, we propose a novel task-oriented data quality assessment framework, which balances between the intrinsic and contextual quality. We carefully craft the assessment metrics, quantify them, and fuse them to rank candidate datasets by quality given specific tasks. To improve the system efficiency, two fast calculation algorithms are designed to quantify the relationship between datasets and the task, and the distribution of data items. We conduct extensive evaluations on six public image datasets (with 460, 247 images in total) and four text document datasets (with 37, 372 documents in total) to evaluate the efficacy and efficiency of our design. Experimental results show that our algorithms can save about 90% computing time with little accuracy loss which validates the feasibility and effectiveness of our framework for large datasets.
Environmental Science,Engineering,Computer Science