Abstract:With the extensive applications of machine learning, it has been witnessed that machine learning has been applied in various fields such as e-commerce, mobile data processing, health analytics and behavioral analytics etc. Word vector training is usually deployed in machine learning to provide a model architecture and optimization, for example, to learn word embeddings from a large amount of datasets. Training word vector in machine learning needs a lot of datasets to train and then outputs a model, however, some of which might contain private and sensitive information, and the training phase will lead to the exposure of the trained model and user datasets. In order to offer utilizable, plausible, and personalized alternatives to users, this process usually also entails a breach of their privacy. For instance, the user data might contain of face,irirs and personal identities etc. This will release serious problem in the machine learning. In this article, we investigate the problem of training high-quality word vectors on encrypted datasets by using privacy-preserving learning algorithms. Firstly, we use a pseudo-random function to generate a statistical token for each word to help build the vocabulary of the word vector. Then we employ functional inner-product encryption to calculate the activation function to obtain the inner product, securely. Finally, we use BGN cryptosystem to encrypt and hide the sensitive datasets, and complete the homomorphic operation over the ciphertexts to perform the training procedure. In order to implement the privacy preservation of word vector training, we propose four privacy-preserving machine learning schemes to provide the privacy protection in our scheme. We analyze the security and efficiency of our protocols and give the numerical experiments. Compared with the existing solutions, it indicates that our scheme can provide a higher efficiency and less communication overhead.

A Privacy-Preserving Word Embedding Text Classification Model Based on Privacy Boundary Constructed by Deep Belief Network

Privacy-Preserving Collaborative Model Learning: the Case of Word Vector Training

Privacy Preserving PCA for Multiparty Modeling

A Distributed Privacy-Preserving Framework for Deep Learning with Edge-Cloud Computing.

A New Noise Generating Method Based on Gaussian Sampling for Privacy Preservation

Understanding and Mitigating the Threat of Vec2Text to Dense Retrieval Systems

Privacy Preserving Naive Bayes Classification

Privacy-Preserving Classification of Personal Text Messages with Secure Multi-Party Computation: An Application to Hate-Speech Detection

A Neighbourhood-Aware Differential Privacy Mechanism for Static Word Embeddings

A secure and privacy-preserving word vector training scheme based on functional encryption with inner-product predicates

Differentially Private Support Vector Machines with Knowledge Aggregation

Multidimensional Perceptron for Efficient and Explainable Long Text Classification

Split-and-Denoise: Protect large language model inference with local differential privacy

PEM: A Practical Differentially Private System for Large-Scale Cross-Institutional Data Mining.

2P-DNN : Privacy-Preserving Deep Neural Networks Based on Homomorphic Cryptosystem

Subword Embedding from Bytes Gains Privacy without Sacrificing Accuracy and Complexity

A privacy-preserving decentralized credit scoring method based on multi-party information

A Differentially Private Weighted Empirical Risk Minimization Procedure and its Application to Outcome Weighted Learning

A comprehensive survey and taxonomy on privacy-preserving deep learning

A Privacy-Preserving Classification Mining Algorithm

Privacy-Preserving Classification on Deep Learning with Exponential Mechanism