Abstract:Machine Learning (ML), addresses a multitude of complex issues in multiple disciplines, including social sciences, finance, and medical research. ML models require substantial computing power and are only as powerful as the data utilized. Due to the high computational cost of ML methods, data scientists frequently use Machine Learning-as-a-Service (MLaaS) to outsource computation to external servers. However, when working with private information, like financial data or health records, outsourcing the computation might result in privacy issues. Recent advances in Privacy-Preserving Techniques (PPTs) have enabled ML training and inference over protected data through the use of Privacy-Preserving Machine Learning (PPML). However, these techniques are still at a preliminary stage and their application in real-world situations is demanding. In order to comprehend the discrepancy between theoretical research suggestions and actual applications, this work examines the past and present of PPML, focusing on Homomorphic Encryption (HE) and Secure Multi-party Computation (SMPC) applied to ML. This work primarily focuses on the ML model's training phase, where maintaining user data privacy is of utmost importance. We provide a solid theoretical background that eases the understanding of current approaches and their limitations. We also provide some preliminaries of SMPC, HE, and ML. In addition, we present a systemization of knowledge of the most recent PPML frameworks for model training and provide a comprehensive comparison in terms of the unique properties and performances on standard benchmarks. Also, we reproduce the results for some of the surveyed papers and examine at what level existing works in the field provide support for open science. We believe our work serves as a valuable contribution by raising awareness about the current gap between theoretical advancements and real-world applications in PPML, specifically regarding open-source availability, reproducibility, and usability.

Machine Learning with Privacy by Knowledge Aggregation and Transfer

Private Knowledge Transfer via Model Distillation with Generative Adversarial Networks

Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Privacy-Preserving Collaborative Deep Learning with Unreliable Participants.

PKDGAN: Private Knowledge Distillation with Generative Adversarial Networks

Deep Learning with Differential Privacy

Anonymizing Machine Learning Models

An Overview of Privacy in Machine Learning

On the Protection of Private Information in Machine Learning Systems: Two Recent Approaches

SecureML: A System for Scalable Privacy-Preserving Machine Learning

Data Privacy and Trustworthy Machine Learning

Approximate, Adapt, Anonymize (3A): a Framework for Privacy Preserving Training Data Release for Machine Learning

Privacy at a Price: Exploring its Dual Impact on AI Fairness

Privacy Side Channels in Machine Learning Systems

Insuring against the perils in distributed learning: privacy-preserving empirical risk minimization

Protection Against Reconstruction and Its Applications in Private Federated Learning

Privacy-Preserving Machine Learning: Methods, Challenges and Directions

Privacy-Preserving Machine Learning Algorithms for Big Data Systems

SoK: Wildest Dreams: Reproducible Research in Privacy-preserving Neural Network Training

Practical and Efficient Secure Aggregation for Privacy-Preserving Machine Learning.

Privacy‐enhancing machine learning framework with private aggregation of teacher ensembles