Basic Data Operators For Entity Resolution

Hongzhi Wang
DOI: https://doi.org/10.4018/978-1-4666-5198-2.ch011
2014-01-01
Abstract:This chapter focuses on the basic data operators for entity resolution, which include similarity search, similarity join, and clustering on sets or strings. These three problems are of increasing complexity, and the solution of simpler problems is the building blocks for the harder problem. The authors first introduce the solution of similarity search, covering gram-based algorithms and sketch-based algorithms. Then the chapter turns to the solution of similarity join, covering both exact and approximate algorithms. At last, the authors deal with the problem of clustering similar strings in a set, which can be applied to duplicate detection in databases.
What problem does this paper attempt to address?