Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

Stephen E. Fienberg
DOI: https://doi.org/10.1214/088342306000000240
2006-09-11
Abstract:The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.
Statistics Theory
What problem does this paper attempt to address?
The problem that this paper attempts to solve is, in the context of the rapid development of e - commerce and data warehouses, how to protect personal privacy and data confidentiality when using multiple databases for data mining, matching and information sharing. Specifically, the article explores the following aspects: 1. **Challenges of privacy and confidentiality**: With the development of e - commerce and the widespread use of online databases, personal privacy is seriously threatened. Even with encryption and other nominally protective measures, it is still necessary to prevent privacy violations through the association between multiple databases. 2. **Privacy issues in multi - database linking**: When multiple databases (including government and private databases) are used to identify potential terrorists or other criminal activities, how to ensure that these operations do not violate the confidentiality commitment to personal data. Especially after the September 11th incident, the US government strengthened the data - mining activities of personal data, which has aroused public concern about privacy protection. 3. **Privacy - protection technologies**: The article discusses several technologies aimed at searching multiple databases without revealing individual data, such as Privacy - Preserving Data Mining (PPDM), Multiparty Secure Computation, etc., and analyzes the effectiveness and limitations of these methods. 4. **The concept of selective revelation**: The article pays special attention to the cross - database matching problem and the concept of "selective revelation", and explores the impact of these methods on privacy protection. 5. **Legal and policy recommendations**: In view of the deficiencies of existing laws and regulations in protecting privacy, the article also puts forward some improvement measures and policy recommendations to better balance the relationship between national security needs and personal privacy protection. In summary, the core issue of this paper is to explore how to ensure that personal privacy is not violated while using multi - source data for analysis in the era of big data, and proposes a variety of technical and policy - level solutions.