Abstract:With the extensive applications of machine learning, it has been witnessed that machine learning has been applied in various fields such as e-commerce, mobile data processing, health analytics and behavioral analytics etc. Word vector training is usually deployed in machine learning to provide a model architecture and optimization, for example, to learn word embeddings from a large amount of datasets. Training word vector in machine learning needs a lot of datasets to train and then outputs a model, however, some of which might contain private and sensitive information, and the training phase will lead to the exposure of the trained model and user datasets. In order to offer utilizable, plausible, and personalized alternatives to users, this process usually also entails a breach of their privacy. For instance, the user data might contain of face,irirs and personal identities etc. This will release serious problem in the machine learning. In this article, we investigate the problem of training high-quality word vectors on encrypted datasets by using privacy-preserving learning algorithms. Firstly, we use a pseudo-random function to generate a statistical token for each word to help build the vocabulary of the word vector. Then we employ functional inner-product encryption to calculate the activation function to obtain the inner product, securely. Finally, we use BGN cryptosystem to encrypt and hide the sensitive datasets, and complete the homomorphic operation over the ciphertexts to perform the training procedure. In order to implement the privacy preservation of word vector training, we propose four privacy-preserving machine learning schemes to provide the privacy protection in our scheme. We analyze the security and efficiency of our protocols and give the numerical experiments. Compared with the existing solutions, it indicates that our scheme can provide a higher efficiency and less communication overhead.

Privacy-Preserving Collaborative Model Learning: the Case of Word Vector Training

Privacy-Preserving Collaborative Deep Learning with Unreliable Participants.

A secure and privacy-preserving word vector training scheme based on functional encryption with inner-product predicates

Privacy-Preserving Vertical Collaborative Logistic Regression without Trusted Third-Party Coordinator

Training Encrypted Models with Privacy-preserved Data on Blockchain

Pencil: Private and Extensible Collaborative Learning without the Non-Colluding Assumption

Decentralized Collaborative Learning Framework with External Privacy Leakage Analysis

Investigating Privacy Attacks in the Gray-Box Setting to Enhance Collaborative Learning Schemes

Robust and privacy-preserving collaborative training: a comprehensive survey

SecureML: A System for Scalable Privacy-Preserving Machine Learning

A Blockchain-Based Fairness Guarantee Approach for Privacy-Preserving Collaborative Training in Computing Force Network

Adversarial Representation Sharing: A Quantitative and Secure Collaborative Learning Framework

PPCL: Privacy-preserving collaborative learning for mitigating indirect information leakage

Robust and Privacy-Preserving Collaborative Learning: A Comprehensive Survey

How To Construct Support Vector Machines Without Breaching Privacy

Privacy-Preserving Collaborative Learning through Feature Extraction

Understanding Privacy Risks of Embeddings Induced by Large Language Models

Privacy in Large Language Models: Attacks, Defenses and Future Directions

DEPN: Detecting and Editing Privacy Neurons in Pretrained Language Models

Towards Scalable and Privacy-Preserving Deep Neural Network via Algorithmic-Cryptographic Co-design

Distributed Modelling Approaches for Data Privacy Preserving