PHEP: Paillier Homomorphic Encryption Processors for Privacy-Preserving Applications in Cloud Computing

Guiming Shi,Yi Li,Xueqiang Wang,Zhanhong Tan,Dapeng Cao,Jingwei Cai,Yuchen Wei,Zehua Li,Wuke Zhang,Yifu Wu,Wei Xu,Kaisheng Ma
DOI: https://doi.org/10.1109/HCS59251.2023.10254692
2023-01-01
Abstract:• Cloud computing has evolved into the key infrastructure of emerging applications, storing massive amounts of data. Yet, how to safely handle this sensitive data in a shared cloud is a major concern. Paillier homomorphic encryption is an important privacy protection approach that permits arithmetic operations on ciphertext without first decrypting it, offering a viable solution to the privacy dilemma. • The Paillier approach has a significant computational overhead compared to plaintext computation because computing in the ciphertext domain requires expensive large integer modular operations that are inefficient for CPUs. As a result, it is preferable to create domain-specific processors for Paillier. Paillier computing patterns are divided into two types, both of which are extensively employed in Paillier applications: independent vector operations and multiply-and-accumulate (MAC) operations. The former is primarily employed in applications such as private information retrieval and on the client side for privacy-preserving AI. In contrast, the latter is required for cloud-side AI inference, particularly computing convolution in neural networks. • We introduce PHEP: Paillier Homomorphic Encryption Processors for cloud-based privacy-preserving applications. PHEP is built on two Paillier acceleration chips: Paillier engine-1 and Paillier engine-2, both produced on the same wafer. Paillier engine-1 focuses on vector operations and attempts to increase computation as much as feasible. It contains 80 processing elements (PE) and can provide 480 TOPS (INT8) for a 16-chip Full-Height-Full-Length (FHFL) PCle card. Paillier engine-2 is designed for MAC operations and has 16 high-performance bit-serial sparse PEs. It only has 192 TOPS (INT8) for an 8-chip FHFL PCle board. However, it is specialized for matrix operations like convolutions. Both engine chips have the same hardware interface, allowing them to use the same PCB board, FPGA scheduler, and software framework design. The PHEP accelerator card also contains a host FPGA. The host FPGA schedules both data transfers and computation among these engine chips. To manage these engines, we use a complex software stack. The software stack includes an offline compiler and an online task scheduler for automatically balancing compute workload across multiple cards on the same server and even across multiple servers. The findings of the end-to-end evaluation reveal that PHEP can perform Paillier-based machine learning workloads 1–2 orders of magnitude faster than state-of-the-art CPUs (Intel Xeon Platinum 8260M with 192 cores), making these privacy-preserving applications practical.
What problem does this paper attempt to address?