Abstract:Abstract With data becoming a salient asset worldwide, dependence amongst data kept on growing. Hence the real-world datasets that one works upon in today’s time are highly correlated. Since the past few years, researchers have given attention to this aspect of data privacy and found a correlation among data. The existing data privacy guarantees cannot assure the expected data privacy algorithms. The privacy guarantees provided by existing algorithms were enough when there existed no relation between data in the datasets. Hence, by keeping the existence of data correlation into account, there is a dire need to reconsider the privacy algorithms. Some of the research has considered utilizing a well-known machine learning concept, i.e., Data Correlation Analysis, to understand the relationship between data in a better way. This concept has given some promising results as well. Though it is still concise, the researchers did a considerable amount of research on correlated data privacy. Researchers have provided solutions using probabilistic models, behavioral analysis, sensitivity analysis, information theory models, statistical correlation analysis, exhaustive combination analysis, temporal privacy leakages, and weighted hierarchical graphs. Nevertheless, researchers are doing work upon the real-world datasets that are often large (technologically termed big data) and house a high amount of data correlation. Firstly, the data correlation in big data must be studied. Researchers are exploring different analysis techniques to find the best suitable. Then, they might suggest a measure to guarantee privacy for correlated big data. This survey paper presents a detailed survey of the methods proposed by different researchers to deal with the problem of correlated data privacy and correlated big data privacy and highlights the future scope in this area. The quantitative analysis of the reviewed articles suggests that data correlation is a significant threat to data privacy. This threat further gets magnified with big data. While considering and analyzing data correlation, then parameters such as Maximum queries executed, Mean average error values show better results when compared with other methods. Hence, there is a grave need to understand and propose solutions for correlated big data privacy.

Data release for machine learning via correlated differential privacy

Correlated Differential Privacy of Multiparty Data Release in Machine Learning

Enhancing correlated big data privacy using differential privacy and machine learning

Privacy-Preserving Correlated Data Publication: Privacy Analysis and Optimal Noise Design

Correlated tuple data release via differential privacy

Machine learning concepts for correlated Big Data privacy

Differential privacy medical data publishing method based on attribute correlation

Differentially Private Streaming Data Release under Temporal Correlations via Post-processing

Collecting Multi-type and Correlation-Constrained Streaming Sensor Data with Local Differential Privacy

DPMLBench: Holistic Evaluation of Differentially Private Machine Learning

Game Theory Based Correlated Privacy Preserving Analysis in Big Data

Quantifying Differential Privacy in Continuous Data Release Under Temporal Correlations

Differentially private publication for related POI discovery

Pufferfish Privacy Mechanisms for Correlated Data

Correlation Analysis for Key-Value Data with Local Differential Privacy

Feature Selection from Differentially Private Correlations

Correlated Privacy Mechanisms for Differentially Private Distributed Mean Estimation

Differentially Private Support Vector Machines with Knowledge Aggregation

Differentially Private Online Federated Learning with Correlated Noise

Improving Utility for Privacy-Preserving Analysis of Correlated Columns using Pufferfish Privacy

DPPro: Differentially Private High-Dimensional Data Release Via Random Projection