Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

Stephen E. Fienberg

DOI: https://doi.org/10.1214/088342306000000240

2006-09-11

Abstract:The growing expanse of e-commerce and the widespread availability of online databases raise many fears regarding loss of privacy and many statistical challenges. Even with encryption and other nominal forms of protection for individual databases, we still need to protect against the violation of privacy through linkages across multiple databases. These issues parallel those that have arisen and received some attention in the context of homeland security. Following the events of September 11, 2001, there has been heightened attention in the United States and elsewhere to the use of multiple government and private databases for the identification of possible perpetrators of future attacks, as well as an unprecedented expansion of federal government data mining activities, many involving databases containing personal information. We present an overview of some proposals that have surfaced for the search of multiple databases which supposedly do not compromise possible pledges of confidentiality to the individuals whose data are included. We also explore their link to the related literature on privacy-preserving data mining. In particular, we focus on the matching problem across databases and the concept of ``selective revelation'' and their confidentiality implications.

Statistics Theory

What problem does this paper attempt to address?

The problem that this paper attempts to solve is, in the context of the rapid development of e - commerce and data warehouses, how to protect personal privacy and data confidentiality when using multiple databases for data mining, matching and information sharing. Specifically, the article explores the following aspects: 1. **Challenges of privacy and confidentiality**: With the development of e - commerce and the widespread use of online databases, personal privacy is seriously threatened. Even with encryption and other nominally protective measures, it is still necessary to prevent privacy violations through the association between multiple databases. 2. **Privacy issues in multi - database linking**: When multiple databases (including government and private databases) are used to identify potential terrorists or other criminal activities, how to ensure that these operations do not violate the confidentiality commitment to personal data. Especially after the September 11th incident, the US government strengthened the data - mining activities of personal data, which has aroused public concern about privacy protection. 3. **Privacy - protection technologies**: The article discusses several technologies aimed at searching multiple databases without revealing individual data, such as Privacy - Preserving Data Mining (PPDM), Multiparty Secure Computation, etc., and analyzes the effectiveness and limitations of these methods. 4. **The concept of selective revelation**: The article pays special attention to the cross - database matching problem and the concept of "selective revelation", and explores the impact of these methods on privacy protection. 5. **Legal and policy recommendations**: In view of the deficiencies of existing laws and regulations in protecting privacy, the article also puts forward some improvement measures and policy recommendations to better balance the relationship between national security needs and personal privacy protection. In summary, the core issue of this paper is to explore how to ensure that personal privacy is not violated while using multi - source data for analysis in the era of big data, and proposes a variety of technical and policy - level solutions.

Privacy and Confidentiality in an e-Commerce World: Data Mining, Data Warehousing, Matching and Disclosure Limitation

Information Security in Big Data: Privacy and Data Mining

A Summary of Privacy-Preserving Data Publishing in the Local Setting

Data Privacy Preservation and Security Approaches for Sensitive Data in Big Data

Elevating Big Data Privacy: Innovative Strategies and Challenges in Data Abundance

A Brief Study of Privacy-Preserving Practices (PPP) in Data Mining

Individual privacy versus public good: protecting confidentiality in health research

The Conflict Between Big Data and Individual Privacy

An Entropy Approach to Disclosure Risk Assessment: Lessons from Real Applications and Simulated Domains

Privacy-Preserving Data Analysis for the Federal Statistical Agencies

Privacy Technologies for Financial Intelligence

Balancing data privacy and usability in the federal statistical system

Big Data Privacy in Biomedical Research

Preserving The Safety And Confidentiality Of Data Mining Information In Health Care: A literature review

Privacy Vulnerabilities of Dataset Anonymization Techniques

A Research on Security, Privacy Issues and Privacy Preserving Techniques - Big Data

Application Research of Data Mining Technology in Personal Privacy Protection and Material Data Analysis

Data Mining with Privacy Protection Using Precise Elliptical Curve Cryptography

Privacy-preserving Data Mining in Industry

Privacy-Preserving Classification of Customer Data Without Loss of Accuracy