Abstract:Machine learning (ML) is increasingly being adopted in a wide variety of application domains. Usually, a well-performing ML model relies on a large volume of training data and high-powered computational resources. Such a need for and the use of huge volumes of data raise serious privacy concerns because of the potential risks of leakage of highly privacy-sensitive information; further, the evolving regulatory environments that increasingly restrict access to and use of privacy-sensitive data add significant challenges to fully benefiting from the power of ML for data-driven applications. A trained ML model may also be vulnerable to adversarial attacks such as membership, attribute, or property inference attacks and model inversion attacks. Hence, well-designed privacy-preserving ML (PPML) solutions are critically needed for many emerging applications. Increasingly, significant research efforts from both academia and industry can be seen in PPML areas that aim toward integrating privacy-preserving techniques into ML pipeline or specific algorithms, or designing various PPML architectures. In particular, existing PPML research cross-cut ML, systems and applications design, as well as security and privacy areas; hence, there is a critical need to understand state-of-the-art research, related challenges and a research roadmap for future research in PPML area. In this paper, we systematically review and summarize existing privacy-preserving approaches and propose a Phase, Guarantee, and Utility (PGU) triad based model to understand and guide the evaluation of various PPML solutions by decomposing their privacy-preserving functionalities. We discuss the unique characteristics and challenges of PPML and outline possible research directions that leverage as well as benefit multiple research communities such as ML, distributed systems, security and privacy.

The Impact of Machine Learning Algorithms and Big Data on Privacy in Data Collection and Analysis

An Overview of Privacy in Machine Learning

When Machine Learning Meets Privacy: A Survey and Outlook

Machine learning and genomics: precision medicine vs. patient privacy

Privacy in the age of medical big data

State-of-the-Art Approaches to Enhancing Privacy Preservation of Machine Learning Datasets: A Survey

Transparency, Fairness, Data Protection, Neutrality: Data Management Challenges in the Face of New Regulation

AI-Driven Anonymization: Protecting Personal Data Privacy While Leveraging Machine Learning

Big data privacy: The datafication of personal information

A Statistical Overview on Data Privacy

Privacy at a Price: Exploring its Dual Impact on AI Fairness

Predictive Privacy: Collective Data Protection in the Context of AI and Big Data

Privacy Threats and Protection in Machine Learning

Privacy-Preserving Machine Learning: Methods, Challenges and Directions

Exploring the Evolving Dynamics of Data Privacy, Ethical Considerations, and Data Protection in the Digital Era

Privacy by design in big data: An overview of privacy enhancing technologies in the era of big data analytics

Achieving big data privacy in education

Privacy and artificial intelligence: challenges for protecting health information in a new era

On the Protection of Private Information in Machine Learning Systems: Two Recent Approaches

Elevating Big Data Privacy: Innovative Strategies and Challenges in Data Abundance

Technical Perspective: Graph Theory for Data Privacy: A New Approach for Complex Data Flows