QFL: Federated Learning Acceleration Based on QAT Hardware Accelerator.

Kai Cheng,Ziyang Zou,Zian Wang,Jian Yang,Shuangwu Chen
DOI: https://doi.org/10.1145/3661725.3661747
2024-01-01
Abstract:Federated Learning(FL) enables geographically dispersed organizations to collaboratively train a machine learning model. In this process, a parameter server enables global updating and synchronization of model by receiving and aggregating model data from multiple clients. In order to ensure security of this process, homomorphic encryption (HE) algorithms are used by clients to achieve data privacy. However, HE brings huge computational overhead (i.e., the computational cost of data encryption/decryption) and communication overhead (multiple rounds of FL communication, more than 150 times of ciphertext expansion in each round), and eventually becomes the performance bottleneck of the entire FL system. In this paper, we present QFL, a system solution for FL based on Intel QAT(Quick Assist Technology) hardware accelerator that substantially reduces the computation and communication overhead caused by HE. Based on the optimized HE algorithm, we leverage coroutines to concurrently and asynchronously offload the HE modular exponentiation operation to the QAT, and use an event-driven mechanism to get QAT calculation results timely to reduce computing overhead. Through the combination of error feedback gradient compression algorithm and QAT Hardware Accelerated Huffman coding, we greatly reduce the communication overhead and accelerate server-side gradient aggregation,reduce the system complexity. Our solution improves encryption throughput by 16 × compared with the open source Python encryption library python-Paillier[1]. Compared with the state-of-the-art FL framework with HE [32], our solution shrinks the training time by 3 × when reaching the same test accuracy.
What problem does this paper attempt to address?