Accelerating Vertical Federated Learning

Dongqi Cai,Tao Fan,Yan Kang,Lixin Fan,Mengwei Xu,Shangguang Wang,Qiang Yang
DOI: https://doi.org/10.1109/TBDATA.2022.3192898
2024-01-21
Abstract:Privacy, security and data governance constraints rule out a brute force process in the integration of cross-silo data, which inherits the development of the Internet of Things. Federated learning is proposed to ensure that all parties can collaboratively complete the training task while the data is not out of the local. Vertical federated learning is a specialization of federated learning for distributed features. To preserve privacy, homomorphic encryption is applied to enable encrypted operations without decryption. Nevertheless, together with a robust security guarantee, homomorphic encryption brings extra communication and computation overhead. In this paper, we analyze the current bottlenecks of vertical federated learning under homomorphic encryption comprehensively and numerically. We propose a straggler-resilient and computation-efficient accelerating system that reduces the communication overhead in heterogeneous scenarios by 65.26% at most and reduces the computation overhead caused by homomorphic encryption by 40.66% at most. Our system can improve the robustness and efficiency of the current vertical federated learning framework without loss of security.
Cryptography and Security,Performance
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper primarily proposes solutions to the communication and computation overhead issues in Vertical Federated Learning (VFL) under Homomorphic Encryption (HE) environments. #### Background and Motivation 1. **Background**: - Federated Learning (FL) allows different institutions to collaboratively train models without sharing raw data, addressing privacy and security concerns. - Vertical Federated Learning (VFL) is specifically used in distributed feature scenarios, where different institutions possess different feature dimensions. - Homomorphic Encryption, as a privacy-preserving technique, is widely used in VFL but introduces additional communication and computation overhead. 2. **Current Bottlenecks**: - Homomorphic Encryption leads to data size inflation, increasing communication latency. - In heterogeneous network environments, communication latency is further exacerbated. - Computation overhead increases, especially during homomorphic encryption operations. #### Main Contributions 1. **System Measurement**: - For the first time, a systematic measurement of VFL on the industrial-grade federated learning framework FATE was conducted, revealing its main performance bottlenecks. 2. **Accelerated System**: - An accelerated system is proposed to address the performance bottlenecks in VFL while maintaining security. 3. **Experimental Validation**: - Extensive experiments validate the effectiveness of the system, making VFL more efficient in practical applications. ### Specific Solutions 1. **Backup Worker Mechanism**: - Combines the Stale Synchronous Parallel Parameter Server (SSP) scheme, using old backup data to fill in missing data. - Reduces waiting time during network instability, improving overall efficiency. 2. **Dynamic Feature Selection**: - Uses Principal Component Analysis (PCA) to compress the input matrix, reducing the number of multiplications and thus lowering computation overhead. - Generates a compressed matrix at the beginning of each iteration, balancing efficiency and accuracy. Through the above methods, this paper proposes a practical and efficient accelerated system that significantly improves the efficiency and robustness of existing VFL frameworks without compromising security.