A Differential Privacy Preserving Approach for Logistic Regression in Genome-Wide Association Studies

Ziwei Han,Laifeng Lu,Hai Liu
DOI: https://doi.org/10.1109/nana.2019.00040
2019-01-01
Abstract:Genome-wide association studies have found genetic loci associated with disease occurrence through case-control studies, and logistic regression analysis is used when other covariates such as height, diet, and gender affected disease probabilities. However, with the generation of a large amount of data, malicious inference attacks become possible. Studies have shown that individual information can be inferred by the published regression coefficients. In order to publish a regression model with privacy protection, we propose a differentially private method which disturbs the cost function's coefficients instead of regression coefficients. We calculate the difference function of original cost function and disturbed cost function, transform the function into a polynomial by using Taylor expansion, then we add noises to the coefficients of the polynomial, finally we get the minimum perturbation value for regression coefficients, so that we can release the disturbed regression coefficients. By this way we can get the appropriate disturbance of regression coefficients. After verification, this method can obtain good prediction accuracy and protect privacy from being leaked, so that it can be used in releasing regression coefficients.
What problem does this paper attempt to address?