CARM: CUDA-Accelerated RNS Multiplication in Word-Wise Homomorphic Encryption Schemes for Internet of Things

Shiyu Shen,Hao Yang,Yu Liu,Zhe Liu,Yunlei Zhao
DOI: https://doi.org/10.1109/tc.2022.3227874
IF: 3.183
2022-01-01
IEEE Transactions on Computers
Abstract:Homomorphic encryption (HE), which allows computation over encrypted data, has often been used to preserve privacy. However, the computationally heavy nature and complexity of network topologies make the deployment of HE schemes in the Internet of Things (IoT) scenario difficult. In this work, we propose CARM, the first optimized GPU implementation that covers BGV, BFV and CKKS, targeting for accelerating homomorphic multiplication using GPU in heterogeneous IoT systems. Our solution is suitable for accelerating RNS homomorphic multiplication on both high-performance and embedded GPUs, as it is a parametric and generic design and offers various trade-offs between resource and efficiency. We offer constant-time low-level arithmetic with minimum instructions and memory usage, as well as performance- and memory-prior configurations. Through this, we can provide more real-time evaluation results and relieve the computational pressure on cloud devices. We deploy our implementations on two GPUs. Compared to the CPU implementation, we achieve up to $378.4 imes$378.4×, $234.5 imes$234.5×, and $287.2 imes$287.2× speedup for homomorphic multiplication of BGV, BFV, and CKKS on Tesla V100S, and $8.8 imes$8.8×, $9.2 imes$9.2×, and $10.3 imes$10.3× on Jetson AGX Xavier, respectively.
engineering, electrical & electronic,computer science, hardware & architecture
What problem does this paper attempt to address?