Abstract:Because machine learning models, especially black-box malicious models vulnerable to attribute inference attacks, are capable of generating a great deal of privacy leakage, recent work has focused on assessing these models in an attempt to prevent unexpected attribute privacy leakage. While there has been some success at model privacy risk evaluations, these traditional solutions are almost brittle in practice because they not only require white-box access to obtain model feature layer outputs but also their evaluation results are heavily influenced by the training dataset and the model structure, leading to difficulty in generalization. In this paper, we propose a novel unawareness detection mechanism for discovering black-box malicious models and quantifying potential unawareness privacy leakage risk along with machine learning models to overcome the two limitations. A new method for quantifying the privacy risk caused by a specific loss function has been proposed to mitigate the impact of the training dataset and the model structure. A new evaluation model has also been proposed that uses Matthew's correlation coefficient score as a new metric and the final output of the target model as a new input. In addition, the theoretical upper bound of the model privacy risk has also been given a mathematical formula that is positively correlated to the mutual information between the sensitive attributes and the target model outputs. Compared with traditional detection methods, our evaluation model reduces the requirement for model access and minimizes evaluation errors caused by data imbalance, and our privacy risk assessment method and theoretical upper bounds on privacy risk can be applied to a broader range of datasets and target model structures. The experimental results show that the adversary's prediction capability is affected by the distribution of datasets and the level of malicious intent of the model, which is consistent with the theoretical prediction, and the detecting method can find potential model privacy leaks in public datasets UTKFace and FairFace.

Estimating Web Attack Detection via Model Uncertainty from Inaccurate Annotation

Model Uncertainty Based Annotation Error Fixing for Web Attack Detection

Towards Trustworthy Web Attack Detection: An Uncertainty-Aware Ensemble Deep Kernel Learning Model

Crafting and Detecting Adversarial Web Requests.

MalCertain: Enhancing Deep Neural Network Based Android Malware Detection by Tackling Prediction Uncertainty

Uncertainty is Fragile: Manipulating Uncertainty in Large Language Models

Enhancing Trustworthiness in ML-Based Network Intrusion Detection with Uncertainty Quantification

Can We Leverage Predictive Uncertainty to Detect Dataset Shift and Adversarial Examples in Android Malware Detection?

Bayesian Learned Models Can Detect Adversarial Malware For Free

Extensible Machine Learning for Encrypted Network Traffic Application Labeling via Uncertainty Quantification

ML-DOCTOR: Holistic Risk Assessment of Inference Attacks Against Machine Learning Models

SecurityNet: Assessing Machine Learning Vulnerabilities on Public Models

Understanding and Tackling Label Errors in Deep Learning-Based Vulnerability Detection (experience Paper).

Unawareness detection: Discovering black-box malicious models and quantifying privacy leakage risks

Leveraging Uncertainty for Improved Static Malware Detection Under Extreme False Positive Constraints

Towards characterizing adversarial defects of deep learning software from the lens of uncertainty

Backdoor Attack Detection Via Prediction Trustworthiness Assessment

Attack and Defense of Deep Learning Models in the Field of Web Attack Detection

Open Set Recognition for Malware Traffic via Predictive Uncertainty

Towards Improving the Trustworthiness of Hardware based Malware Detector using Online Uncertainty Estimation

Robustness Quantification Method for Network Intrusion Detection Models