Abstract:Because machine learning models, especially black-box malicious models vulnerable to attribute inference attacks, are capable of generating a great deal of privacy leakage, recent work has focused on assessing these models in an attempt to prevent unexpected attribute privacy leakage. While there has been some success at model privacy risk evaluations, these traditional solutions are almost brittle in practice because they not only require white-box access to obtain model feature layer outputs but also their evaluation results are heavily influenced by the training dataset and the model structure, leading to difficulty in generalization. In this paper, we propose a novel unawareness detection mechanism for discovering black-box malicious models and quantifying potential unawareness privacy leakage risk along with machine learning models to overcome the two limitations. A new method for quantifying the privacy risk caused by a specific loss function has been proposed to mitigate the impact of the training dataset and the model structure. A new evaluation model has also been proposed that uses Matthew's correlation coefficient score as a new metric and the final output of the target model as a new input. In addition, the theoretical upper bound of the model privacy risk has also been given a mathematical formula that is positively correlated to the mutual information between the sensitive attributes and the target model outputs. Compared with traditional detection methods, our evaluation model reduces the requirement for model access and minimizes evaluation errors caused by data imbalance, and our privacy risk assessment method and theoretical upper bounds on privacy risk can be applied to a broader range of datasets and target model structures. The experimental results show that the adversary's prediction capability is affected by the distribution of datasets and the level of malicious intent of the model, which is consistent with the theoretical prediction, and the detecting method can find potential model privacy leaks in public datasets UTKFace and FairFace.

Information Leakage Detection through Approximate Bayes-optimal Prediction

Obtaining Information Leakage Bounds via Approximate Model Counting

Hybrid Statistical Estimation of Mutual Information and its Application to Information Flow

Modelling and Quantifying Membership Information Leakage in Machine Learning

Detection of Information leakage in cloud

A New Approach to Adaptive Data Analysis and Learning via Maximal Leakage

Approximate Optimal Estimation Based on Kullback–Leibler Divergence for Lossy Networks Without Acknowledgement

Unawareness detection: Discovering black-box malicious models and quantifying privacy leakage risks

Survey: Leakage and Privacy at Inference Time

Measuring Quantum Information Leakage Under Detection Threat

Breach By A Thousand Leaks: Unsafe Information Leakage in `Safe' AI Responses

The Asymptotic Behaviour of Information Leakage Metrics

Comparison Of Measuring Information Leakage For Fully Probabilistic Systems

Optimal Privacy-Aware Dynamic Estimation

Blind Faith: Privacy-Preserving Machine Learning using Function Approximation

Maximal Guesswork Leakage

Data Disclosure with Non-zero Leakage and Non-invertible Leakage Matrix

Learning in the Dark: Privacy-Preserving Machine Learning using Function Approximation

How Much Does Each Datapoint Leak Your Privacy? Quantifying the Per-datum Membership Leakage

Minimax-Optimal Bounds for Detectors Based on Estimated Prior Probabilities

Bayesian Learned Models Can Detect Adversarial Malware For Free