Abstract:Background Data sharing in multicenter medical research can improve the generalizability of research, accelerate progress, enhance collaborations among institutions, and lead to new discoveries from data pooled from multiple sources. Despite these benefits, many medical institutions are unwilling to share their data, as sharing may cause sensitive information to be leaked to researchers, other institutions, and unauthorized users. Great progress has been made in the development of secure machine learning frameworks based on homomorphic encryption in recent years; however, nearly all such frameworks use a single secret key and lack a description of how to securely evaluate the trained model, which makes them impractical for multicenter medical applications. Objective The aim of this study is to provide a privacy-preserving machine learning protocol for multiple data providers and researchers (eg, logistic regression). This protocol allows researchers to train models and then evaluate them on medical data from multiple sources while providing privacy protection for both the sensitive data and the learned model. Methods We adapted a novel threshold homomorphic encryption scheme to guarantee privacy requirements. We devised new relinearization key generation techniques for greater scalability and multiplicative depth and new model training strategies for simultaneously training multiple models through x-fold cross-validation. Results Using a client-server architecture, we evaluated the performance of our protocol. The experimental results demonstrated that, with 10-fold cross-validation, our privacy-preserving logistic regression model training and evaluation over 10 attributes in a data set of 49,152 samples took approximately 7 minutes and 20 minutes, respectively. Conclusions We present the first privacy-preserving multiparty logistic regression model training and evaluation protocol based on threshold homomorphic encryption. Our protocol is practical for real-world use and may promote multicenter medical research to some extent.

Private Hierarchical Clustering and Efficient Approximation

Privacy-Preserving Collaborative Deep Learning with Unreliable Participants.

Privacy Preserving Distributed DBSCAN Clustering

Privacy Preserving PCA for Multiparty Modeling

Efficient Privacy-Preserving Machine Learning in Hierarchical Distributed System

Privacy-Preserving Optimal Parameter Selection for Collaborative Clustering

PPA-DBSCAN: Privacy-preserving ρ-Approximate Density-based Clustering

PPCL: Privacy-preserving collaborative learning for mitigating indirect information leakage

Improved Hierarchical Clustering on Massive Datasets with Broad Guarantees

Hierarchical Clustering via Single and Complete Linkage Using Fully Homomorphic Encryption

Insuring against the perils in distributed learning: privacy-preserving empirical risk minimization

Secure Byzantine-Robust Distributed Learning via Clustering

Efficient Clustering on Encrypted Data.

Web-Based Privacy-Preserving Multicenter Medical Data Analysis Tools Via Threshold Homomorphic Encryption: Design and Development Study

Fair Polylog-Approximate Low-Cost Hierarchical Clustering

Differentially Private k-Means Clustering with Guaranteed Convergence

Privacy Preserving Collaborative Computing: Heterogeneous Privacy Guarantee and Efficient Incentive Mechanism

Privacy-Preserving Affinity Propagation Clustering over Vertically Partitioned Data

Privacy-Preserving Vertical Collaborative Logistic Regression without Trusted Third-Party Coordinator

HybridAlpha: An Efficient Approach for Privacy-Preserving Federated Learning

Achieving data utility-privacy tradeoff in Internet of Medical Things: A machine learning approach