SAFE: A Scalable Homomorphic Encryption Accelerator for Vertical Federated Learning
Zhaohui Chen,Zhen Gu,Yanheng Lu,Xuanle Ren,Ruiguang Zhong,Wen-jie Lu,Jiansong Zhang,Yichi Zhang,Hanghang Wu,Xiaofu Zheng,Heng Liu,Tingqiang Chu,Cheng Hong,Changzheng Wei,Dimin Niu,Yuan Xie
DOI: https://doi.org/10.1109/tcad.2024.3496836
IF: 2.9
2024-01-01
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Abstract:Privacy preservation has become a critical concern for governments, hospitals, and large corporations. Homomorphic encryption (HE) enables a ciphertext-based computation paradigm with strong security guarantees. In emerging cross-agency data cooperation scenarios like vertical federated learning (VFL), HE protects the data interaction from exposure to counterparts. However, computation on ciphertext has significant performance challenges due to increased data size and substantial overhead. Related work has been proposed to accelerate HE using parallel hardware, such as GPUs, FPGAs, and ASICs. However, many existing hardware accelerators target specific HE operations, such as number theoretic transform and key switching, providing limited performance improvement for end-to-end applications. Others support bootstrapping, which requires quite a large ASIC design. To better support existing VFL training applications, we propose SAFE, an HE accelerator for scalable homomorphic matrix-vector products (HMVP), which is the performance bottleneck. SAFE adopts coefficient-wise encoded HMVP algorithm, despite a vanilla mode, we further explore the compressed and concatenated modes, which can fully utilize the polynomial encoding slots. The proposed hardware architecture, customized for HMVP dataflow, supports spatial and temporal parallelization of function units. The most costly polynomial function, number theoretic transform, is implemented with a low-area constant geometry unit which improve efficiency by 2.43×. SAFE is implemented as a CPU-FPGA heterogeneous acceleration system, unleashing the multithread potential. The evaluation demonstrates an up to 36× speed-up in end-to-end federated logistic regression training.