Privacy-Preserving Fuzzy Matching Using A Public Reference Table

Chaoyi Pang,Lifang Gu,David Hansen,Anthony Maeder
DOI: https://doi.org/10.1007/978-3-642-00179-6_5
2009-01-01
Abstract:In this paper we address the problem of matching data from different databases using a third party, where the actual data can not be disclosed. The aim is to provide a mechanism for improved matching results across databases while preserving the privacy of sensitive information in those databases. This is particularly relevant with health related databases, where bringing data about patients together from multiple databases allows for important medical research, but the sensitive nature of the data requires that identifying information never be disclosed.The method described uses a public reference table to provide a way for matching people's names in different databases without requiring identifying information to be revealed to any party outside the originating data source. An advantage of our algorithm is that it provides a mechanism for dealing with typographical or other errors in the data.The key features of our proposed approach are: (1) original private data from individual custodians are never revealed to any other party because data comparison is performed at individual custodians and only comparison results, which are data in the reference table, are sent; (2) the third party performs the match based on encrypted values in the public reference table and some distance information. Experimental results show that our proposed method performs fuzzy matching (similarity join) at an accuracy comparable to that of conventional fuzzy matching algorithms without revealing any identifying information.
What problem does this paper attempt to address?