CleanCloud: Cleaning Big Data on Cloud.

Hongzhi Wang,Xiaoou Ding,Xiangying Chen,Jianzhong Li,Hong Gao
DOI: https://doi.org/10.1145/3132847.3133187
2017-01-01
Abstract:We describe CleanCloud, a system for cleaning big data based on Map-Reduce paradigm in cloud. Using Map-Reduce paradigm, the system detects and repairs various data quality problems in big data. We demonstrate the following features of CleanCloud: (a) the support for cleaning multiple data quality problems in big data; (b) a visual tool for watching the status of big data cleaning process and tuning the parameters for data cleaning; (c) the friendly interface for data input and setting as well as cleaned data collection for big data. CleanCloud is a promising system that provides scalable and effect data cleaning mechanism for big data in either files or databases.
What problem does this paper attempt to address?