Abstract:Artificial Intelligence (AI) has achieved remarkable success in image generation, image analysis, and language modeling, making data-driven techniques increasingly relevant in practical real-world applications, promising enhanced creativity and efficiency for human users. However, the deployment of AI in high-stakes domains such as infrastructure and healthcare still raises concerns regarding algorithm accountability and safety. The emerging field of explainable AI (XAI) has made significant strides in developing interfaces that enable humans to comprehend the decisions made by data-driven models. Among these approaches, concept-based explainability stands out due to its ability to align explanations with high-level concepts familiar to users. Nonetheless, early research in adversarial machine learning has unveiled that exposing model explanations can render victim models more susceptible to attacks. This is the first study to investigate and compare the impact of concept-based explanations on the privacy of Deep Learning based AI models in the context of biomedical image analysis. An extensive privacy benchmark is conducted on three different state-of-the-art model architectures (ResNet50, NFNet, ConvNeXt) trained on two biomedical (ISIC and EyePACS) and one synthetic dataset (SCDB). The success of membership inference attacks while exposing varying degrees of attribution-based and concept-based explanations is systematically compared. The findings indicate that, in theory, concept-based explanations can potentially increase the vulnerability of a private AI system by up to 16% compared to attributions in the baseline setting. However, it is demonstrated that, in more realistic attack scenarios, the threat posed by explanations is negligible in practice. Furthermore, actionable recommendations are provided to ensure the safe deployment of concept-based XAI systems. In addition, the impact of differential privacy (DP) on the quality of concept-based explanations is explored, revealing that while negatively influencing the explanation ability, DP can have an adverse effect on the models' privacy.

The privacy issue of counterfactual explanations: explanation linkage attacks

Privacy Implications of Explainable AI in Data-Driven Systems

Private Counterfactual Retrieval

Disagreement amongst counterfactual explanations: how transparency can be misleading

Disagreement amongst counterfactual explanations: How transparency can be deceptive

Explanation Leaks: Explanation-guided Model Extraction Attacks

A Survey of Privacy-Preserving Model Explanations: Privacy Risks, Attacks, and Countermeasures

Exploring Counterfactual Explanations Through the Lens of Adversarial Examples: A Theoretical and Empirical Analysis

Translating theory into practice: assessing the privacy implications of concept-based explanations for biomedical AI

Counterfactual Explanations and Algorithmic Recourses for Machine Learning: A Review

Privacy Meets Explainability: A Comprehensive Impact Benchmark

"How do I fool you?": Manipulating User Trust via Misleading Black Box Explanations

PreCoF: counterfactual explanations for fairness

Towards Explainable Model Extraction Attacks

The privacy-explainability trade-off: unraveling the impacts of differential privacy and federated learning on attribution methods

Privacy-preserving explainable AI: a survey

Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law

Explaining Black-Box Algorithms Using Probabilistic Contrastive Counterfactuals

Knowledge Distillation-Based Model Extraction Attack using GAN-based Private Counterfactual Explanations

Critical Empirical Study on Black-box Explanations in AI