GraphTheft: Quantifying Privacy Risks in Graph Prompt Learning

Jiani Zhu,Xi Lin,Yuxin Qi,Qinghua Mao
2024-11-22
Abstract:Graph Prompt Learning (GPL) represents an innovative approach in graph representation learning, enabling task-specific adaptations by fine-tuning prompts without altering the underlying pre-trained model. Despite its growing prominence, the privacy risks inherent in GPL remain unexplored. In this study, we provide the first evaluation of privacy leakage in GPL across three attacker capabilities: black-box attacks when GPL as a service, and scenarios where node embeddings and prompt representations are accessible to third parties. We assess GPL's privacy vulnerabilities through Attribute Inference Attacks (AIAs) and Link Inference Attacks (LIAs), finding that under any capability, attackers can effectively infer the properties and relationships of sensitive nodes, and the success rate of inference on some data sets is as high as 98%. Importantly, while targeted inference attacks on specific prompts (e.g., GPF-plus) maintain high success rates, our analysis suggests that the prompt-tuning in GPL does not significantly elevate privacy risks compared to traditional GNNs. To mitigate these risks, we explored defense mechanisms, identifying that Laplacian noise perturbation can substantially reduce inference success, though balancing privacy protection with model performance remains challenging. This work highlights critical privacy risks in GPL, offering new insights and foundational directions for future privacy-preserving strategies in graph learning.
Cryptography and Security
What problem does this paper attempt to address?
### What problems does this paper attempt to solve? This paper aims to evaluate and quantify the privacy risks in Graph Prompt Learning (GPL). Specifically, the author focuses on whether an attacker can successfully infer the attributes and relationships of sensitive nodes through attack means at different ability levels (such as black - box attacks, accessing node embeddings and prompt representations) under the GPL framework. The main research contents include: 1. **Privacy Leakage Evaluation**: - The author systematically evaluates the privacy leakage risks in GPL for the first time, especially for three different attacker capabilities: black - box attacks when GPL is a service, and cases where third - parties can access node embeddings and prompt representations. - Through Attribute Inference Attacks (AIAs) and Link Inference Attacks (LIAs), the research finds that attackers can effectively infer the attributes and relationships of sensitive nodes in various situations, with an inference success rate of up to 98% on some datasets. 2. **Comparison with Traditional GNNs**: - Research shows that although targeted inference attacks for specific prompts (such as GPF - plus) maintain a high success rate, prompt tuning in GPL does not significantly increase the privacy risk, and its privacy risk is not significantly increased compared with traditional GNNs. 3. **Exploration of Defense Mechanisms**: - To mitigate these privacy threats, the author explores defense mechanisms and finds that Laplacian noise perturbation can significantly reduce the probability of successful inference, but how to maintain model performance while protecting privacy remains a challenge. 4. **Summary of Contributions**: - **First Privacy Risk Assessment**: Conducted a comprehensive privacy risk assessment of GPL, especially in node classification tasks, revealing important privacy vulnerabilities. - **Design of Inference Attacks**: Introduced AIAs and LIAs to evaluate the privacy risks of GPL and demonstrated a high success rate. - **Evaluation of Defense Mechanisms**: Initially proposed a defense method based on Laplacian perturbation, and experimental results show that this method can effectively reduce the attack success rate. - **Cross - Dataset Verification**: Verified the effectiveness of attacks and defenses on six real - world datasets and five typical GPL methods, and established a relatively comprehensive privacy risk assessment framework. ### Formula Presentation - **Node Embedding Update Formula**: \[ h_v^{(l)}=\text{Update}^l(h_N^{(l)}(v)) \] where \(h_N^{(l)}(v)=\text{Aggregate}^l(h_u^{(l - 1)}|u\in N(v))\), \(N(v)\) is the set of neighbors of node \(v\), and \(h_u^{(l - 1)}\) is the node embedding of layer \(l - 1\). - **Deep Graph Information Maximization (DGI) Loss Function**: \[ L_{\text{DGI}} = -\log\sigma(\hat{g}^T\hat{g}') \] - **Edge Prediction (EdgePred) Loss Function**: \[ L_{\text{EdgePred}} = -\log\left(\sum_{u,v\in E}s_{uv}+\sum_{u,v\notin E}(1 - s_{uv})\right) \] - **Self - Supervised Graph Masked Auto - Encoder (GraphMAE) Loss Function**: \[ L_{\text{SCE}}=\frac{1}{|\tilde{V}|}\sum_{v_i\in\tilde{V}}\left(1-\frac{x_i^Tz_i}{|x_i|\cdot|z_i|}\right)^\gamma,\quad\gamma\geq1