Bayesian Record Linkage with Variables in One File

Gauri Kamat,Mingyang Shan,Roee Gutman
DOI: https://doi.org/10.1002/sim.9894
2023-08-31
Abstract:In many healthcare and social science applications, information about units is dispersed across multiple data files. Linking records across files is necessary to estimate the associations of interest. Common record linkage algorithms only rely on similarities between linking variables that appear in all the files. Moreover, analysis of linked files often ignores errors that may arise from incorrect or missed links. Bayesian record linking methods allow for natural propagation of linkage error, by jointly sampling the linkage structure and the model parameters. We extend an existing Bayesian record linkage method to integrate associations between variables exclusive to each file being linked. We show analytically, and using simulations, that this method can improve the linking process, and can yield accurate inferences. We apply the method to link Meals on Wheels recipients to Medicare Enrollment records.
Methodology,Applications
What problem does this paper attempt to address?
This paper attempts to solve the problem of how to accurately link records in different files when information is scattered across multiple data files in medical and social science research. Specifically, it aims to improve the existing Bayesian record - linking methods to better handle the associations between variables unique to each file and improve the accuracy of linking. ### Main problem description of the paper 1. **Problem of information dispersion**: - In many medical and social science studies, information about individuals is usually scattered across multiple data files. - In order to estimate the associations of interest, these records need to be linked. 2. **Limitations of existing methods**: - Common record - linking algorithms only rely on the similarities between the linking variables that appear in all files. - When analyzing the linked files, errors caused by incorrect or improper linking are often ignored. 3. **Advantages of the Bayesian record - linking method**: - The Bayesian record - linking method can naturally propagate linking errors by jointly sampling the linking structure and model parameters. - Existing Bayesian methods mainly focus on variables that are common to all files and ignore the associations between variables unique to each file. 4. **Innovations of this paper**: - Expand the existing Bayesian record - linking method to integrate the associations between variables unique to each file. - Through theoretical analysis and simulation experiments, it is proved that this method can improve the linking process and lead to accurate inferences. ### Specific application The author applies this method to link the recipients of "Meals on Wheels" with Medicare enrollment records, demonstrating the effectiveness of this method in practical applications. ### Summary The core problem of this paper is to develop an improved Bayesian record - linking method to more effectively handle information scattered across multiple files and improve the accuracy and reliability of linking. By introducing the modeling of associations between variables unique to each file, this method can better deal with possible errors in the linking process, thus providing a more reliable basis for subsequent statistical analysis.