A Study of Secure Algorithms for Vertical Federated Learning: Take Secure Logistic Regression as an Example

Huan-Chih Wang,Ja-Ling Wu
2024-10-30
Abstract:After entering the era of big data, more and more companies build services with machine learning techniques. However, it is costly for companies to collect data and extract helpful handcraft features on their own. Although it is a way to combine with other companies' data for boosting the model's performance, this approach may be prohibited by laws. In other words, finding the balance between sharing data with others and keeping data from privacy leakage is a crucial topic worthy of close attention. This paper focuses on distributed data and conducts secure model training tasks on a vertical federated learning scheme. Here, secure implies that the whole process is executed in the encrypted domain. Therefore, the privacy concern is released.
Cryptography and Security,Machine Learning
What problem does this paper attempt to address?
The core problem that this paper attempts to solve is: in the vertical federated learning framework, how to train machine - learning models, especially logistic regression models, through secure algorithms while ensuring data privacy. Specifically, the paper focuses on how to use homomorphic encryption technology to achieve secure data exchange and joint modeling when different companies or institutions cannot directly share data, thereby improving model performance and protecting data privacy. ### Main Problem Background With the advent of the big data era, more and more companies are using machine - learning techniques to build services. However, the cost of collecting data and extracting useful features is high. Although sharing data with other companies can improve model performance, this practice may be restricted by laws and regulations. Therefore, while sharing data to improve model performance, how to ensure that data privacy is not leaked has become an important problem that needs to be solved urgently. ### Specific Objectives of the Paper 1. **Secure Data Exchange and Model Training**: In the vertical federated learning framework, design secure protocols so that participants can conduct data exchange and model training without revealing data. 2. **Apply Homomorphic Encryption**: Use homomorphic encryption (Homomorphic Encryption, HE) technology to ensure that the entire training process is carried out in the ciphertext domain, thus relieving privacy concerns. 3. **Improve the Performance of Non - linear Classification Tasks**: Explore the application of kernel functions in logistic regression, especially their performance in dealing with non - linearly separable data sets, and evaluate their feasibility and performance in the ciphertext domain. ### Key Technical Means - **Homomorphic Encryption (HE)**: Supports direct addition and multiplication operations on ciphertexts, ensuring that data remains encrypted during transmission and calculation. - **Kernel Functions**: Introduce kernel functions (such as linear kernel, polynomial kernel, RBF kernel) to improve the performance of the logistic regression model in non - linear classification tasks. - **Secure Multi - party Computation (SMPC)**: Combine SMPC technology to further enhance data privacy protection. ### Experimental Verification The paper experimentally verifies the effectiveness of the proposed method, including: - Using non - linearly separable data sets such as `make_circles` and `make_moons` to compare the influence of different kernel functions and approximate sigmoid functions on model performance. - Testing real - world data sets (such as the low - birth - weight study data set and the prostate cancer study data set) to evaluate the performance of the model in practical applications. In short, this paper aims to explore how to effectively improve the performance of the logistic regression model under the vertical federated learning framework through secure algorithms and homomorphic encryption technology while protecting data privacy.