Fairness Issues and Mitigations in (Differentially Private) Socio-demographic Data Processes

Joonhyuk Ko,Juba Ziani,Saswat Das,Matt Williams,Ferdinando Fioretto
2024-08-16
Abstract:Statistical agencies rely on sampling techniques to collect socio-demographic data crucial for policy-making and resource allocation. This paper shows that surveys of important societal relevance introduce sampling errors that unevenly impact group-level estimates, thereby compromising fairness in downstream decisions. To address these issues, this paper introduces an optimization approach modeled on real-world survey design processes, ensuring sampling costs are optimized while maintaining error margins within prescribed tolerances. Additionally, privacy-preserving methods used to determine sampling rates can further impact these fairness issues. The paper explores the impact of differential privacy on the statistics informing the sampling process, revealing a surprising effect: not only the expected negative effect from the addition of noise for differential privacy is negligible, but also this privacy noise can in fact reduce unfairness as it positively biases smaller counts. These findings are validated over an extensive analysis using datasets commonly applied in census statistics.
Cryptography and Security,Artificial Intelligence,Computers and Society
What problem does this paper attempt to address?
The problems that this paper attempts to solve mainly focus on the fairness issues existing in the sociodemographic process and their mitigation measures. Specifically, the paper is concerned with the unfairness in downstream decision - making caused by sampling errors unevenly affecting the estimates of different groups when collecting sociodemographic data. For example, in policy - making and resource allocation, if the estimation error for certain ethnic groups is large, these groups may not receive the attention and support they deserve. To address these problems, the paper proposes the following points: 1. **Optimizing the Sampling Scheme**: The paper introduces an optimization - based method to design the sampling process, ensuring that while optimizing the sampling cost, the error of each ethnic group is kept within the predetermined tolerance range. This method aims to balance cost and accuracy, especially maintaining statistical accuracy in different sub - populations. 2. **The Impact of Privacy - Protection Technologies**: The paper also explores the impact of privacy - protection technologies (especially differential privacy) on the bias of demographic data. The study finds that although differential privacy protects data privacy by adding noise, this noise not only does not significantly increase the error, but can also reduce unfairness, because it has a positive offset effect on the estimates of small - scale groups (such as ethnic minorities). 3. **Verification in Practical Applications**: The paper conducts extensive analysis using US census data to verify the effectiveness of the above - mentioned methods. The research shows that the optimized sampling scheme can not only reduce the sampling cost, but also improve the estimation accuracy for different population groups, thereby reducing unfairness. Overall, this paper aims to improve the fairness and accuracy of sociodemographic data by optimizing the sampling design and introducing privacy - protection technologies, in order to support more effective policy - making and resource allocation.