Abstract:After entering the era of big data, more and more companies build services with machine learning techniques. However, it is costly for companies to collect data and extract helpful handcraft features on their own. Although it is a way to combine with other companies' data for boosting the model's performance, this approach may be prohibited by laws. In other words, finding the balance between sharing data with others and keeping data from privacy leakage is a crucial topic worthy of close attention. This paper focuses on distributed data and conducts secure model training tasks on a vertical federated learning scheme. Here, secure implies that the whole process is executed in the encrypted domain. Therefore, the privacy concern is released.

What problem does this paper attempt to address?

The core problem that this paper attempts to solve is: in the vertical federated learning framework, how to train machine - learning models, especially logistic regression models, through secure algorithms while ensuring data privacy. Specifically, the paper focuses on how to use homomorphic encryption technology to achieve secure data exchange and joint modeling when different companies or institutions cannot directly share data, thereby improving model performance and protecting data privacy. ### Main Problem Background With the advent of the big data era, more and more companies are using machine - learning techniques to build services. However, the cost of collecting data and extracting useful features is high. Although sharing data with other companies can improve model performance, this practice may be restricted by laws and regulations. Therefore, while sharing data to improve model performance, how to ensure that data privacy is not leaked has become an important problem that needs to be solved urgently. ### Specific Objectives of the Paper 1. **Secure Data Exchange and Model Training**: In the vertical federated learning framework, design secure protocols so that participants can conduct data exchange and model training without revealing data. 2. **Apply Homomorphic Encryption**: Use homomorphic encryption (Homomorphic Encryption, HE) technology to ensure that the entire training process is carried out in the ciphertext domain, thus relieving privacy concerns. 3. **Improve the Performance of Non - linear Classification Tasks**: Explore the application of kernel functions in logistic regression, especially their performance in dealing with non - linearly separable data sets, and evaluate their feasibility and performance in the ciphertext domain. ### Key Technical Means - **Homomorphic Encryption (HE)**: Supports direct addition and multiplication operations on ciphertexts, ensuring that data remains encrypted during transmission and calculation. - **Kernel Functions**: Introduce kernel functions (such as linear kernel, polynomial kernel, RBF kernel) to improve the performance of the logistic regression model in non - linear classification tasks. - **Secure Multi - party Computation (SMPC)**: Combine SMPC technology to further enhance data privacy protection. ### Experimental Verification The paper experimentally verifies the effectiveness of the proposed method, including: - Using non - linearly separable data sets such as `make_circles` and `make_moons` to compare the influence of different kernel functions and approximate sigmoid functions on model performance. - Testing real - world data sets (such as the low - birth - weight study data set and the prostate cancer study data set) to evaluate the performance of the model in practical applications. In short, this paper aims to explore how to effectively improve the performance of the logistic regression model under the vertical federated learning framework through secure algorithms and homomorphic encryption technology while protecting data privacy.

A Study of Secure Algorithms for Vertical Federated Learning: Take Secure Logistic Regression as an Example

Secure Logistic Regression for Vertical Federated Learning

Efficient Vertical Federated Learning with Secure Aggregation

Privacy-Preserving Vertical Federated Logistic Regression without Trusted Third-Party Coordinator

VFLR: An Efficient and Privacy-Preserving Vertical Federated Framework for Logistic Regression

Peer-to-peer privacy-preserving vertical federated learning without trusted third-party coordinator

Privacy Threats Analysis to Secure Federated Learning

Large-scale Secure XGB for Vertical Federated Learning

From distributed machine learning to federated learning: In the view of data privacy and security

Asymmetrical Vertical Federated Learning

ACCEL: an Efficient and Privacy-Preserving Federated Logistic Regression Scheme over Vertically Partitioned Data

FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data

Accelerating Vertical Federated Learning

Privacy Leakage of Real-World Vertical Federated Learning

Secure Vertical Federated Learning Under Unreliable Connectivity

ELXGB: an Efficient and Privacy-Preserving XGBoost for Vertical Federated Learning

A Secure Federated Transfer Learning Framework

Beyond Model Splitting: Preventing Label Inference Attacks in Vertical Federated Learning with Dispersed Training

A Survey of Privacy Threats and Defense in Vertical Federated Learning: From Model Life Cycle Perspective

Distributed and Deep Vertical Federated Learning with Big Data

A flexible and privacy-preserving federated learning framework based on logistic regression