Fregata: Fast Private Inference with Unified Secure Two-Party Protocols

Xuanang Yang,Jing Chen,Yuqing Li,Kun He,Xiaojie Huang,Zikuan Jiang,Hao Bai,Ruiying Du
DOI: https://doi.org/10.1109/tifs.2024.3444327
IF: 7.231
2024-01-01
IEEE Transactions on Information Forensics and Security
Abstract:Private Inference (PI) safeguards client and server privacy when the client utilizes the server’s model to make predictions. Existing PI solutions for Convolutional Neural Networks (CNNs) employ distinct cryptographic primitives to customize secure two-party protocols for linear and non-linear layers. This requires data to be converted into a specific form to switch between protocols, thus leading to a significant increase in inference latency. In this paper, we present Fregata, a fast PI scheme for CNNs by leveraging identical cryptographic primitives to calculate both linear and nonlinear layers. Specifically, our protocols utilize homomorphic encryption to obtain additive secret shares of matrix products during the offline phase, followed by lightweight multiplication and addition operations on these shares in the latency-sensitive online phase. Benefiting from uniformity, we accelerate inference from a holistic perspective by decoupling certain procedures of our protocols and executing them asynchronously. Moreover, to improve the efficiency of the offline phase, we elaborate a homomorphic matrix multiplication calculation method with reduced computation and communication complexity compared to existing approaches. Furthermore, we minimize inference latency by employing graphics processing units to parallelize the operations on the shares during the online phase. Experimental evaluations on popular CNN models such as SqueezeNet, ResNet, and DenseNet demonstrate that Fregata reduces 35-45 times inference latency over the state-of-the-art counterparts, accompanied by a 1.6-2.8 times decrease in communication overhead. In terms of total runtime, Fregata maintains a reduction of approximately 3 times.
What problem does this paper attempt to address?