Mapping the Privacy-Utility Tradeoff in Mobile Phone Data for Development

Alejandro Noriega-Campero,Alex Rutherford,Oren Lederman,Yves A. de Montjoye,Alex Pentland
DOI: https://doi.org/10.48550/arXiv.1808.00160
2018-08-01
Computers and Society
Abstract:Today's age of data holds high potential to enhance the way we pursue and monitor progress in the fields of development and humanitarian action. We study the relation between data utility and privacy risk in large-scale behavioral data, focusing on mobile phone metadata as paradigmatic domain. To measure utility, we survey experts about the value of mobile phone metadata at various spatial and temporal granularity levels. To measure privacy, we propose a formal and intuitive measure of reidentification risk$\unicode{x2014}$the information ratio$\unicode{x2014}$and compute it at each granularity level. Our results confirm the existence of a stark tradeoff between data utility and reidentifiability, where the most valuable datasets are also most prone to reidentification. When data is specified at ZIP-code and hourly levels, outside knowledge of only 7% of a person's data suffices for reidentification and retrieval of the remaining 93%. In contrast, in the least valuable dataset, specified at municipality and daily levels, reidentification requires on average outside knowledge of 51%, or 31 data points, of a person's data to retrieve the remaining 49%. Overall, our findings show that coarsening data directly erodes its value, and highlight the need for using data-coarsening, not as stand-alone mechanism, but in combination with data-sharing models that provide adjustable degrees of accountability and security.
What problem does this paper attempt to address?