Abstract:Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model relies on a large volume of training data and high-powered computational resources. Such a need for and the use of huge volumes of data raise serious privacy concerns because of the potential risks of leakage of highly privacy-sensitive information; further, the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data add significant challenges to fully benefiting from the power of ML for data-driven applications. A trained ML model may also be vulnerable to adversarial attacks such as membership, attribute, or property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are critically needed for many emerging applications. Increasingly, significant research efforts from both academia and industry can be seen in PPML areas that aim toward integrating privacy-preserving techniques into ML pipeline or specific algorithms, or designing various PPML architectures. In particular, existing PPML research cross-cut ML, systems and applications design, as well as security and privacy areas; hence, there is a critical need to understand state-of-the-art research, related challenges and a research roadmap for future research in PPML area. In this paper, we systematically review and summarize existing privacy-preserving approaches and propose a Phase, Guarantee, and Utility (PGU) triad based model to understand and guide the evaluation of various PPML solutions by decomposing their privacy-preserving functionalities. We discuss the unique characteristics and challenges of PPML and outline possible research directions that leverage as well as benefit multiple research communities such as ML, distributed systems, security and privacy.

Privacy-Preserving Machine Learning Algorithms for Big Data Systems

Privacy Preserving Distributed DBSCAN Clustering

Poster: Nebula: an Industrial-purpose Privacy-preserving Machine Learning System

Efficient Privacy-Preserving Machine Learning in Hierarchical Distributed System

Towards Secure and Practical Machine Learning Via Secret Sharing and Random Permutation

SecureML: A System for Scalable Privacy-Preserving Machine Learning

Privacy Preserving Analytics on Distributed Medical Data

Privacy preserving distributed machine learning with federated learning

SHAPER: A General Architecture for Privacy-Preserving Primitives in Secure Machine Learning.

Distributed Private Online Learning for Social Big Data Computing over Data Center Networks

Distributed Modelling Approaches for Data Privacy Preserving

Privacy-Preserving Generalized Linear Models using Distributed Block Coordinate Descent

Privacy-Preserving Graph Machine Learning from Data to Computation: A Survey

A divide-and-conquer approach to privacy-preserving high-dimensional big data release

Scalable Privacy-Preserving Distributed Learning

Federated Extra-Trees with Privacy Preserving

A Review of Privacy-Preserving Machine Learning Classification

Systematic Review of Privacy-Preserving Distributed Machine Learning From Federated Databases in Health Care

Protection of Big Data Privacy

Toward efficient and privacy-preserving computing in big data era

Privacy-Preserving Machine Learning: Methods, Challenges and Directions