MapDupReducer: detecting near duplicates over massive datasets.

Chaokun Wang,Jianmin Wang,Xuemin Lin,Wei Wang,Haixun Wang,Hongsong Li,Wanpeng Tian,Jun Xu,Rui Li
DOI: https://doi.org/10.1145/1807167.1807296
2010-01-01
Abstract:Near duplicate detection benefits many applications, e.g., on-line news selection over the Web by keyword search. The purpose of this demo is to show the design and implementation of MapDupReducer, a MapReduce based system capable of detecting near duplicates over massive datasets efficiently.
What problem does this paper attempt to address?