The equivalence theory based on fuzzy theory

Huayang Li,Yubao Liu,Youkui Li,Hao Gui
DOI: https://doi.org/10.1109/ICMLC.2004.1382388
2004-01-01
Abstract:Data cleaning is an important work during the building process of data warehouse and data mining. The equivalence theory means the theory on how to define two records to be equivalent or duplicated. It is an important problem of data cleaning. The paper addressed a new equivalence theory and equivalence degree concept based on fuzzy theory, and put forward the corresponding calculation method of equivalence degrees. Moreover on the basis of the equivalence theory, the key word "report" is introduced and the method of clustering and handling duplicated records is presented. Compared with traditional equivalence theory, the new one is more convenient to generating rules, clustering and handling duplicated records, and reduces user's time of dealing with single LOG files. In addition, the paper put forward an interactive method based on clustering, which saved much of users' labor.
What problem does this paper attempt to address?