An Enhanced Privacy-Preserving Record Linkage Approach for Multiple Databases
Han Shumin,Shen Derong,Nie Tiezheng,Kou Yue,Yu Ge
DOI: https://doi.org/10.1007/s10586-022-03590-7
2022-01-01
Cluster Computing
Abstract:For the purpose of research, organizations often need to share and link data that belongs to a single individual while protecting the privacy, which is referred to as privacy preserving record linkage (PPRL). Various approaches have been developed to tackle this problem, however, it is still a challenging task due to the massive amount of data, multiple data sources, and ‘dirty’ data. Therefore, in this paper, an enhanced approximate multi-party PPRL (MP-PPRL) approach is proposed to improve privacy, scalability, and linkage quality. For privacy, bloom filter (BF) is a better and more efficient masking techniques than others so far. Thus, the records are encoded into BFs to ensure privacy. However, BFs may be compromised through frequency-based attacks. To enhance privacy, a distributed protocol that introduces multiple linkage units (Multi-LUs) to resist frequency-based attacks is proposed. In scalability, we develop a blocking technique based on sorted nearest neighborhood (SNN) approach for clustering similar BFs across multiple databases, called BF-SNN, which dramatically reduces complexity. In linkage quality, a personalized threshold that varies with different levels of ‘dirty’ data is introduced, which provides a more accurate error-tolerance for ‘dirty’ data and consequently improves linkage quality. An analysis and an empirical study are conducted on large real-world datasets to show the benefit of the proposed approach.