Improved record linkage for encrypted identifying data

Chaoyi Pang,David Hansen
2006-01-01
Abstract:The health data integration project at the E-Health Research Centre is researching ways of improving the integration of health and health rela ted data while maintaining the privacy and security of the data. One such method is to improve the mechanisms of matching patients across databases when the identifying information m ust not be revealed, even during the linkage step. Background: With health related data spread between many admin istrative and clinical databases the ability to bring the data to gether dynamically is important. This could be to support clinical based decision making, admin istrative reporting or for clinical research based access to data. Objectives: There are already mechanisms published for blind f olded record linkage. A mechanism for further strengthenin g the security and privacy of these algorithms is to encrypt the identifying data, such as name, data of birth, before performing the linkage step. However, due to the nature of enc ryption algorithms, encrypted data can only be matched exactly, limiting the ability to allow f or errors in the data. This work presents a mechanism to allow matching of encrypted data when there may be errors in the data. Methods: A public reference table which is common to both d ata custodians is used. Each value in the original data is compared to data in t he public reference table using an edit distance function. Names from the reference table w hich are within a given distance of the original data are sent to the linker. The data from the two data custodians are then compared to decide the likelihood of two records being a mat ch. Results: The method described in this paper performs better than other methods which supp ort matching of encrypted data, such as exact matching or matching using soundex. Discussion and Conclusion: The method described in this paper can be used to improve the level of record matching in tools where access to identifying data is prohibited. This meth od is currently being added to the HDI software tool as another mechanism of matching reco rds between databases.
What problem does this paper attempt to address?