Abstract:Federated learning has solved the problems of data silos and data fragmentation on the premise of satisfying privacy. However, cryptographic algorithms in federated learning brought significant increase in computational complexity, which limited the speed of model training. In this paper, we propose a hardware/software (HW/SW) co-designed field programmable gate array (FPGA) accelerator for federated learning. Firstly, we analyzed the time consumption of each stage in federated learning and the involved cryptographic algorithms, and found the performance bottleneck. Secondly, a HW/SW co-designed architecture is introduced, which can speed up encryption, decryption and ciphertext-space computation at the same time without reconfiguring FPGA circuit. In the HW part, we proposed a Hardware-aware Montgomery Algorithm (HWMA) which utilized data parallelism and pipeline, and designed an FPGA architecture to decouple data access and computation. In the SW part, an Operator Scheduling Engine (OSE) is designed, which can flexibly resolve the target algorithm into multiple HWMA calls, and complete other non-computation-intensive calculations. Finally, evaluations for both specific algorithms and practical applications are implemented. Experimental results show that when deployed on Intel Stratix 10 FPGA, our accelerator can increase the throughput of 2048-bit modular multiplication, modular exponentiation and Paillier algorithm to more than 3x of the CPU. When integrated into a industrial grade federated learning open source framework, the end-to-end training time of linear regression and logistic regression can be shortened by 2.28x and 3.30x respectively, which is more than 2x faster than the reported best results of FPGA accelerator.

QFL: Federated Learning Acceleration Based on QAT Hardware Accelerator.

HQsFL: A Novel Training Strategy for Constructing High-performance and Quantum-safe Federated Learning

FLBooster: A Unified and Efficient Platform for Federated Learning Acceleration.

HAFLO: GPU-Based Acceleration for Federated Logistic Regression

Secure Federated Learning with Model Compression.

BatchCrypt: Efficient Homomorphic Encryption for Cross-Silo Federated Learning

AQUILA: Communication Efficient Federated Learning with Adaptive Quantization in Device Selection Strategy

AsyFed: Accelerated Federated Learning with Asynchronous Communication Mechanism

FLASHE: Additively Symmetric Homomorphic Encryption for Cross-Silo Federated Learning

Communication Efficient Federated Learning with Adaptive Quantization

QuanCrypt-FL: Quantized Homomorphic Encryption with Pruning for Secure Federated Learning

FLCP: federated learning framework with communication-efficient and privacy-preserving

FLAIRS: FPGA-Accelerated Inference-Resistant & Secure Federated Learning

CryptoQFL: Quantum Federated Learning on Encrypted Data

PipeFL: Hardware/Software co-Design of an FPGA Accelerator for Federated Learning

XFL: A High Performace, Lightweighted Federated Learning Framework

Efficient asynchronous federated learning with sparsification and quantization

QuAsyncFL: Asynchronous Federated Learning With Quantization for Cloud–Edge–Terminal Collaboration Enabled AIoT

DeFTA: A Plug-and-Play Peer-to-Peer Decentralized Federated Learning Framework

AEDFL: Efficient Asynchronous Decentralized Federated Learning with Heterogeneous Devices

FedFQ: Federated Learning with Fine-Grained Quantization