Iterative Automated Record Linkage Using Mixture Models

MD Larsen,DB Rubin
DOI: https://doi.org/10.1198/016214501750332956
IF: 4.369
2001-01-01
Journal of the American Statistical Association
Abstract:The goal of record linkage is to link quickly and accurately records that correspond to the same person or entity. Whereas certain patterns of agreements and disagreements on variables are more likely among records pertaining to a single person than among records for different people, the observed patterns for pairs of records can be viewed as arising from a mixture of matches and nonmatches. Mixture model estimates can be used to partition record pairs into two or more groups that can be labeled as probable matches (links) and probable nonmatches (nonlinks). A method is proposed and illustrated that uses marginal information in the database to select mixture models, identifies sets of records for clerks to review based on the models and marginal information, incorporates clerically reviewed datal as they become available, into estimates of model parameters, and classifies pairs as links, nonlinks, or in need of further clerical review The procedure is illustrated with five datasets from the U.S. Bureau of the Census. it appears to be robust to variations in record-linkage sites. The clerical review corrects classifications of some pairs directly and leads to changes in classification of others through reestimation of mixture models.
What problem does this paper attempt to address?