Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption

Jianming Tong,Jingtian Dang,Anupam Golder,Callie Hao,Arijit Raychowdhury,Tushar Krishna
2024-05-08
Abstract:As machine learning (ML) permeates fields like healthcare, facial recognition, and blockchain, the need to protect sensitive data intensifies. Fully Homomorphic Encryption (FHE) allows inference on encrypted data, preserving the privacy of both data and the ML model. However, it slows down non-secure inference by up to five magnitudes, with a root cause of replacing non-polynomial operators (ReLU and MaxPooling) with high-degree Polynomial Approximated Function (PAF). We propose SmartPAF, a framework to replace non-polynomial operators with low-degree PAF and then recover the accuracy of PAF-approximated model through four techniques: (1) Coefficient Tuning (CT) -- adjust PAF coefficients based on the input distributions before training, (2) Progressive Approximation (PA) -- progressively replace one non-polynomial operator at a time followed by a fine-tuning, (3) Alternate Training (AT) -- alternate the training between PAFs and other linear operators in the decoupled manner, and (4) Dynamic Scale (DS) / Static Scale (SS) -- dynamically scale PAF input value within (-1, 1) in training, and fix the scale as the running max value in FHE deployment. The synergistic effect of CT, PA, AT, and DS/SS enables SmartPAF to enhance the accuracy of the various models approximated by PAFs with various low degrees under multiple datasets. For ResNet-18 under ImageNet-1k, the Pareto-frontier spotted by SmartPAF in latency-accuracy tradeoff space achieves 1.42x ~ 13.64x accuracy improvement and 6.79x ~ 14.9x speedup than prior works. Further, SmartPAF enables a 14-degree PAF (f1^2 g_1^2) to achieve 7.81x speedup compared to the 27-degree PAF obtained by minimax approximation with the same 69.4% post-replacement accuracy. Our code is available at
Cryptography and Security
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the performance bottleneck problem encountered during machine learning (ML) inference in the fully homomorphic encryption (FHE) environment. Specifically, when using FHE to encrypt sensitive data and perform ML model inference on it, the processing of non - polynomial operators (such as ReLU and MaxPooling) will cause significant latency, making the inference speed five orders of magnitude slower than the non - secure version. To improve the inference speed and maintain high accuracy, the authors propose a new framework called SMART - PAF, which replaces these non - polynomial operators with low - order polynomial approximation functions (PAF). #### Main challenges 1. **Processing of non - polynomial operators**: - FHE does not support the direct processing of non - polynomial operators, so these operators need to be replaced with polynomial approximation functions. - Although high - order PAF can improve accuracy, it will introduce too many multiplication operations, resulting in increased latency. - Although low - order PAF can reduce latency, it may lead to a decrease in accuracy. 2. **Limitations of existing methods**: - Hybrid Scheme: By offloading non - polynomial operators to other secure schemes, but this will lead to excessive communication overhead. - Existing PAF approximation methods are difficult to converge under high - order PAF, and low - order PAF performs poorly in complex tasks. #### Solutions To overcome the above challenges, the authors propose the SMART - PAF framework, which includes the following four key techniques: 1. **Coefficient Tuning (CT)**: - Adjust the coefficients of PAF according to the input distribution to reduce approximation error and improve model accuracy. 2. **Progressive Approximation (PA)**: - Replace non - polynomial operators step by step and perform fine - tuning after each replacement to ensure the convergence of the training process. 3. **Alternate Training (AT)**: - Train PAF coefficients and other linear layer parameters separately to avoid mutual interference during the training process and improve the convergence speed and accuracy. 4. **Dynamic Scaling / Static Scaling (DS / SS)**: - Dynamically scale the input values to the range of [-1, 1] during the training process to improve the approximation accuracy; fix the scaling factor during deployment to meet the requirements of FHE. Through these techniques, SMART - PAF can achieve higher accuracy and faster inference speed on multiple datasets and models. For example, on ResNet - 18, SMART - PAF can achieve the same 69.4% validation accuracy as the 27 - order PAF while shortening the inference latency by 7.81 times. ### Summary The main goal of this paper is to solve the performance bottleneck problem caused by the processing of non - polynomial operators in the FHE environment by proposing the SMART - PAF framework, thereby achieving fast and accurate private inference.