Cleanix: A Big Data Cleaning Parfait.

Hongzhi Wang,Mingda Li,Yingyi Bu,Jianzhong Li,Hong Gao,Jiacheng Zhang
DOI: https://doi.org/10.1145/2661829.2661837
2014-01-01
Abstract:In this demo, we present Cleanix, a prototype system for cleaning relational Big Data. Cleanix takes data integrated from multiple data sources and cleans them on a shared-nothing machine cluster. The backend system is built on-top-of an extensible and flexible data-parallel substrate - the Hyracks framework. Cleanix supports various data cleaning tasks such as abnormal value detection and correction, incomplete data filling, de-duplication, and conflict resolution. We demonstrate that Cleanix is a practical tool that supports effective and efficient data cleaning at the large scale.
What problem does this paper attempt to address?