Abstract:In the era of big data,personal data has become one of the important resources in every field of scientific research,business analysis,medical services,social computing and so on.The sharing and application of personal data can produce great economic or social value.However,the improper use of personal data is easy to disclose personal privacy information.How to solve the contradiction between data application and personal privacy has become one of the current research hotspots.When personal data is shared and used,it is necessary to delete the explicit identifier attributes like name of the individuals in advance,but the attacker can still reveal the identity privacy or some sensitive information of individuals,through one or more non-sensitive values of the quasiidentifier attributes（QI）,such as gender,age,and region,or some values of the sensitive attributes（SA）,such as salary and disease,in this data set.Most current data privacy studies often assume that the data set has simply one-to-one relationship between individuals and records,which is called single-record data.In order to protect personal privacy in single-record data,scholars have come up with a variety of typical privacy anonymous models,such as k-anonymity,l-diversity,（α,k）-anonymity,t-closeness andβ-likelihood,etc.But in practice,there are a large number of data sets in which one individual may correspond to multiple records,short for multi-record data.If these above privacy models are directly applied on the multi-record data,it may cause some new privacy risks.To protect the privacy of Individuals in multi-record data,several scholars have proposed Identity-reversed（IR） privacy models like IR k-anonymity,IR l-diversity and IR（α,β）-anonymity,as well as enhanced privacy models such as EIR（α,β）-diversity and EIR l-diversity,when considering that the background knowledge related to only QI information is known to the attacker;and a few numbers of scholars have developed（k,k m ）-anonymity and（k,l）-diversity models,supposing that the attacker may know the background knowledge of either QI information or SA information.However,all of these models cannot provide adequate protection for the privacy of individuals in multi-record data.This research analyzes the privacy disclosure problem in the situation of multi-record data when an attacker has more stronger background knowledge,and proposes a new privacy-preserving model as well as the corresponding algorithm to satisfy the stricter privacy needs in applications of multi-record data.In the first part,it discusses all kinds of the privacy risks in the situations that an attacker knows the background knowledge related to either one of and both of the QI and SA information and indicates the defects of the current privacy models.Also,it presents a new privacy disclosure problem named unclosed itemset fingerprint attack（UCIFA）,which is based on the attacking by using strong background knowledge.In the second part,to overcome the UCIFA problem,it requires each person’ s whole sensitive values expressed as the form of an itemset should be closed.If an individual’s SA itemset cannot satisfy the closure constraint by partitioning the records of individuals into several groups,then this itemset should be further processed by the mean of cracking.Based on these,a new privacy model named closure and enhanced identity-reserved l-diversity（CEIR l-diversity） is present,which requires that the QI values and the SA values of each individual should satisfy EIR l-diversity and the closure constraint respectively.In the third part,it develops an algorithm called data anonymization based on closure and enhanced l-diversity（DACEL） to make the multi-record data satisfy CEIR l-diversity.It consists of three core steps:firstly,dividing the records in a multi-record dataset into several QI-groups,so that the records of individuals in each group have similar QI-values and satisfy the constraint of EIR l-diversity;secondly,in each QI-group,cracking the sensitive itemset of each individual that contains non-closed subsets into several small itemsets,each of which must satisfy the closure constraint;finally,in each QI-group,generalizing the QI-values of all records to make the anonymized data table satisfy CEIR l-diversity.In the fourth part,the proposed privacy model and its corresponding algorithm,referring as CEL-method,is compared with two kinds of leading-edge methods on two public multi-record data sets.The results show that the CEL-method has robust performance on efficiently achieving the highest level of privacy protection for multi-record data at the cost of small information loss.In summary,in the practice of personal data application,attackers may have different levels of background knowledge to disclose personal privacy information.The privacy-preserving method proposed in this research is of universal significance for the application of multi-record data privacy protection in practice.

Data Privacy Quantification and De-identification Model Based on Information Theory

Private Data Inference Attacks against Cloud: Model, Technologies, and Research Directions

An Enhanced K-Anonymity Model Against Homogeneity Attack.

A Privacy Protection Model of Data Publication Based on Game Theory

Quantification of De-anonymization Risks in Social Networks

How to Quantify Graph De-anonymization Risks.

Comparative Evaluations of Privacy on Digital Images

Fuzzy Prediction Model in Privacy Protection: Takagi-Sugeno Rules Model Via Differential Privacy

Quantifying Privacy: A Novel Entropy-Based Measure of Disclosure Risk

Attacks on Deidentification's Defenses

Research on data privacy protection method with one-to-multiple records

Unawareness detection: Discovering black-box malicious models and quantifying privacy leakage risks

Seed-Based De-Anonymizability Quantification of Social Networks

A Dynamic Anonymization Privacy-Preserving Model Based on Hierarchical Sequential Three-Way Decisions

DP-QIC: A differential privacy scheme based on quasi-identifier classification for big data publication

On Your Social Network De-anonymizablity: Quantification and Large Scale Evaluation with Seed Knowledge

Enhancing privacy for automatically detected quasi identifier using data anonymization

Value versus damage of information release: A data privacy perspective

A Brief Survey on De-anonymization Attacks in Online Social Networks

Personal Information De-Identification Architecture and Standardization

On Access-Unrestricted Data Anonymity and Privacy Inference Disclosure Control.