An improved BISG for inferring race from surname and geolocation

Philip Greengard,Andrew Gelman
DOI: https://doi.org/10.48550/arXiv.2304.09126
2024-03-01
Abstract:Bayesian Improved Surname Geocoding (BISG) is a ubiquitous tool for predicting race and ethnicity using an individual's geolocation and surname. Here we demonstrate that statistical dependence of surname and geolocation within racial/ethnic categories in the United States results in biases for minority subpopulations, and we introduce a raking-based improvement. Our method augments the data used by BISG--distributions of race by geolocation and race by surname--with the distribution of surname by geolocation obtained from state voter files. We validate our algorithm on state voter registration lists that contain self-identified race/ethnicity.
Applications,Methodology
What problem does this paper attempt to address?