Privacy-Preserving Collaborative Model Learning: the Case of Word Vector Training

Qian Wang,Minxin Du,Xiuying Chen,Yanjiao Chen,Pan Zhou,Xiaofeng Chen,Xinyi Huang
DOI: https://doi.org/10.1109/tkde.2018.2819673
IF: 9.235
2018-01-01
IEEE Transactions on Knowledge and Data Engineering
Abstract:Nowadays, machine learning is becoming a new paradigm for mining hidden knowledge in big data. The collection and manipulation of big data not only create considerable values, but also raise serious privacy concerns. To protect the huge amount of potentially sensitive data, a straightforward approach is to encrypt data with specialized cryptographic tools. However, it is challenging to utilize or operate on encrypted data, especially to perform machine learning algorithms. In this paper, we investigate the problem of training high quality word vectors over large-scale encrypted data (from distributed data owners) with the privacy-preserving collaborative neural network learning algorithms. We leverage and also design a suite of arithmetic primitives (e.g., multiplication, fixed-point representation, sigmoid function computation, etc.) on encrypted data, served as components of our construction. We theoretically analyze the security and efficiency of our proposed construction, and conduct extensive experiments on representative real-world datasets to verify its practicality and effectiveness.
What problem does this paper attempt to address?