Abstract:Vertical Federated Learning (VFL) enables multiple data owners, each holding a different subset of features about a largely overlapping set of data samples, to collaboratively train a global model. The quality of data owners' local features affects the performance of the VFL model, which makes feature selection vitally important. However, existing feature selection methods for VFL either assume the availability of prior knowledge on the number of noisy features or prior knowledge on the post-training threshold of useful features to be selected, making them unsuitable for practical applications. To bridge this gap, we propose the Federated Stochastic Dual-Gate based Feature Selection (FedSDG-FS) approach. It consists of a Gaussian stochastic dual-gate to efficiently approximate the probability of a feature being selected. FedSDG-FS further designs a local embedding perturbation approach to achieve differential privacy for local training data. To reduce overhead, we propose a feature importance initialization method based on Gini impurity, which can accomplish its goals with only two parameter transmissions between the server and the clients. The enhanced version, FedSDG-FS++, protects the privacy for both the clients' training data and the server's labels through Partially Homomorphic Encryption (PHE) without relying on a trusted third-party. Theoretically, we analyze the convergence rate, privacy guarantees and security analysis of our methods. Extensive experiments on both synthetic and real-world datasets show that FedSDG-FS and FedSDG-FS++ significantly outperform existing approaches in terms of achieving more accurate selection of high-quality features as well as improving VFL performance in a privacy-preserving manner.

FedV: Privacy-Preserving Federated Learning over Vertically Partitioned Data

VFLR: An Efficient and Privacy-Preserving Vertical Federated Framework for Logistic Regression

Secure Vertical Federated Learning Under Unreliable Connectivity

FedEmb: A Vertical and Hybrid Federated Learning Algorithm using Network And Feature Embedding Aggregation

Efficient Vertical Federated Learning with Secure Aggregation

OpenVFL: A Vertical Federated Learning Framework With Stronger Privacy-Preserving

BlindFL: Vertical Federated Machine Learning Without Peeking into Your Data

EFMVFL: An Efficient and Flexible Multi-party Vertical Federated Learning without a Third Party

HQsFL: A Novel Training Strategy for Constructing High-performance and Quantum-safe Federated Learning

Achieving Model Fairness in Vertical Federated Learning

Vertical Federated Learning: Challenges, Methodologies and Experiments

Vertical Federated Learning: Concepts, Advances, and Challenges

Hijack Vertical Federated Learning Models As One Party

Vertical Federated Learning: Concepts, Advances and Challenges

Efficient and Privacy-Preserving Feature Importance-based Vertical Federated Learning

Peer-to-peer privacy-preserving vertical federated learning without trusted third-party coordinator

PVD-FL: A Privacy-Preserving and Verifiable Decentralized Federated Learning Framework

Cascade Vertical Federated Learning

Privacy-Preserving Vertical Federated Logistic Regression without Trusted Third-Party Coordinator

Secure and fast asynchronous Vertical Federated Learning via cascaded hybrid optimization