Variance of the Gradient Also Matters: Privacy Leakage from Gradients.

Yijue Wang,Jieren Deng,Dan Guo,Chenghong Wang,Xianrui Meng,Hang Liu,Chao Shang,Binghui Wang,Qin Cao,Caiwen Ding,Sanguthevar Rajasekaran
DOI: https://doi.org/10.1109/ijcnn55064.2022.9892665
2022-01-01
Abstract:Distributed machine learning (DML) enables model training on a large corpus of decentralized data from users and only collects local models or gradients for global synchronization on the cloud. Recent studies show that a third party can recover the training data in the DML system through publicly shared gradients. Our investigation has revealed that existing techniques (e.g., DLG) can only recover the training data on uniform weight distribution and fail to recover the training data on other weights initialization (e.g., normal distribution) or during the training stage. In this work, we provide an analysis of how weight distribution can affect the training data recovery from gradients. Based on this analysis, we propose a self-adaptive privacy attack from gradients, SAPAG—a general gradient attack algorithm that can recover the training data in DML with any weight initialization and in any training phase. Our algorithm exploits not only the gradients but also the variance of gradients. Specifically, we exploit the variance of gradients distribution and the Deep Neural Network (DNN) architecture and design an adaptive Gaussian kernel of gradient difference as a distance measure. Our experimental results on various benchmark datasets and tasks demonstrate the generalizability of SAPAG. SAPAG outperforms the state-of-the-art algorithms in terms of both the data recovery performance and the recovery speed.
What problem does this paper attempt to address?