Abstract:In the era of big data,personal data has become one of the important resources in every field of scientific research,business analysis,medical services,social computing and so on.The sharing and application of personal data can produce great economic or social value.However,the improper use of personal data is easy to disclose personal privacy information.How to solve the contradiction between data application and personal privacy has become one of the current research hotspots.When personal data is shared and used,it is necessary to delete the explicit identifier attributes like name of the individuals in advance,but the attacker can still reveal the identity privacy or some sensitive information of individuals,through one or more non-sensitive values of the quasiidentifier attributes（QI）,such as gender,age,and region,or some values of the sensitive attributes（SA）,such as salary and disease,in this data set.Most current data privacy studies often assume that the data set has simply one-to-one relationship between individuals and records,which is called single-record data.In order to protect personal privacy in single-record data,scholars have come up with a variety of typical privacy anonymous models,such as k-anonymity,l-diversity,（α,k）-anonymity,t-closeness andβ-likelihood,etc.But in practice,there are a large number of data sets in which one individual may correspond to multiple records,short for multi-record data.If these above privacy models are directly applied on the multi-record data,it may cause some new privacy risks.To protect the privacy of Individuals in multi-record data,several scholars have proposed Identity-reversed（IR） privacy models like IR k-anonymity,IR l-diversity and IR（α,β）-anonymity,as well as enhanced privacy models such as EIR（α,β）-diversity and EIR l-diversity,when considering that the background knowledge related to only QI information is known to the attacker;and a few numbers of scholars have developed（k,k m ）-anonymity and（k,l）-diversity models,supposing that the attacker may know the background knowledge of either QI information or SA information.However,all of these models cannot provide adequate protection for the privacy of individuals in multi-record data.This research analyzes the privacy disclosure problem in the situation of multi-record data when an attacker has more stronger background knowledge,and proposes a new privacy-preserving model as well as the corresponding algorithm to satisfy the stricter privacy needs in applications of multi-record data.In the first part,it discusses all kinds of the privacy risks in the situations that an attacker knows the background knowledge related to either one of and both of the QI and SA information and indicates the defects of the current privacy models.Also,it presents a new privacy disclosure problem named unclosed itemset fingerprint attack（UCIFA）,which is based on the attacking by using strong background knowledge.In the second part,to overcome the UCIFA problem,it requires each person’ s whole sensitive values expressed as the form of an itemset should be closed.If an individual’s SA itemset cannot satisfy the closure constraint by partitioning the records of individuals into several groups,then this itemset should be further processed by the mean of cracking.Based on these,a new privacy model named closure and enhanced identity-reserved l-diversity（CEIR l-diversity） is present,which requires that the QI values and the SA values of each individual should satisfy EIR l-diversity and the closure constraint respectively.In the third part,it develops an algorithm called data anonymization based on closure and enhanced l-diversity（DACEL） to make the multi-record data satisfy CEIR l-diversity.It consists of three core steps:firstly,dividing the records in a multi-record dataset into several QI-groups,so that the records of individuals in each group have similar QI-values and satisfy the constraint of EIR l-diversity;secondly,in each QI-group,cracking the sensitive itemset of each individual that contains non-closed subsets into several small itemsets,each of which must satisfy the closure constraint;finally,in each QI-group,generalizing the QI-values of all records to make the anonymized data table satisfy CEIR l-diversity.In the fourth part,the proposed privacy model and its corresponding algorithm,referring as CEL-method,is compared with two kinds of leading-edge methods on two public multi-record data sets.The results show that the CEL-method has robust performance on efficiently achieving the highest level of privacy protection for multi-record data at the cost of small information loss.In summary,in the practice of personal data application,attackers may have different levels of background knowledge to disclose personal privacy information.The privacy-preserving method proposed in this research is of universal significance for the application of multi-record data privacy protection in practice.

A Privacy Risk Assessment Model for Medical Big Data Based on Adaptive Neuro-Fuzzy Theory

A privacy protection method for health care big data management based on risk access control

A medical big data access control model based on fuzzy trust prediction and regression analysis

An Electronic Medical Record Access Control Model Based on Intuitionistic Fuzzy Trust

Cybersecurity of Medical Data Based on Big Data and Privacy Protection Method

An Access Control Model for Medical Big Data Based on Clustering and Risk

Healthcare Big Data Privacy Protection Model Based on Risk-Adaptive Access Control

Privacy Risk Perception of Online Medical Community Users Based on Deep Neural Network

Medical Sports Data Privacy Protection Method Based on Legal Risk Control

Medical big data intrusion detection system based on virtual data analysis from assurance perspective

Research on data privacy protection method with one-to-multiple records

A medical big data access control model based on smart contracts and risk in the blockchain environment

A privacy preserve big data analysis system for wearable wireless sensor network

A Novel Cloud Enabled Access Control Model for Preserving the Security and Privacy of Medical Big Data

Assessment on insurance fraud risk in basic medical insurance in the context of big data

Research on Personal Medical Information Protection Based on Big Data

Research on information security and privacy protection model based on consumer behavior in big data environment

Privacy Preserving Risk Mitigation Approach for Healthcare Domain

A Study on Privacy Information Protection of Medical Big Data in China

A Privacy Risk Assessment Model Based on TAPE Framework

Risk and UCON-based access control model for healthcare big data