EntityManager: An Entity-Based Dirty Data Management System

Hongzhi Wang,Xueli Liu,Jianzhong Li,Xing Tong,Long Yang,Yakun Li
DOI: https://doi.org/10.1007/978-3-642-37450-0_38
2013-01-01
Abstract:Dirty data exist in many systems. Efficient and effective management of dirty data is in demand. Since data cleaning may result in the the loss of useful data and new dirty data, we attempt to manage dirty data without cleaning and retrieve query result according to the quality requirement of users. Since entity is the unit for understanding objects in the world and many dirty data are led by different descriptions of the same real-world entity, we propose EntityManager, a dirty data management system with entity as the basic unit and keep conflicts in data as uncertain attributes. Even though the query language is SQL , the query in our system has different semantics on dirty data. In the demonstration, we will show a new philosophy for managing dirty data around entities. We will present our prototype allowing load dirty data and query dirty data according to the requirement of users.
What problem does this paper attempt to address?