An Aggregation Query Processing Method of Dirty Database Based on Clustering

Guohua Jiang,Hongzhi Wang,Jianzhong Li,Hong Gao
2009-01-01
Journal of Computer Research and Development
Abstract:In the real world databases,dirty data such as incomplete data,inconsistent data,duplicate data affect the effectiveness of applications with databases.It brings new challenges to retrieve data with clean-degree assurance from the database with dirty data.Aggregation queries are the base of statistical analysis.In this paper,an aggregation query processing method on dirty data with cleandegree is proposed.It focuses on aggregation queries with"group by"clause.In dirty databases,one tuple may belong to multiple groups,so the proposed method uses overlap clustering methods to group the tuples and retrieves groups with a clean-degree.Based on these groups,the aggregated results and their clean-degree expressed by probability are computed.The method can deal with several kinds of aggregation functions and aggregation queries with constraints.Experimental results show the efficiency of the algorithms presented in this paper.
What problem does this paper attempt to address?