Abstract:There is a growing trend to outsource the inference task of large transformer models to cloud servers. However, this poses a severe threat to users' private data as they are exposed to cloud servers after uploading. Although several works attempted to provide private inference for transformer models, their hundreds of communication rounds limit the application scenarios. Motivated by the desire to minimize round complexity, we propose CipherFormer, a novel transformer private inference scheme using homomorphic encryption and garbled circuits. We present a protocol for quickly computing homomorphic matrix multiplications. We then modify the attention mechanism and design the corresponding garbled circuits. Furthermore, we show how to use a lightweight attention mechanism and mixed-bitwidth to reduce the inference latency while maintaining accuracy. In comparison with an advanced homomorphic encryption scheme on text classification tasks, our model improves accuracy by 3% to 11% while performing private inference with a 7.7x-11.9x speedup.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is: how to protect users' private data when outsourcing the inference tasks of large Transformer models to cloud servers. Specifically, the paper proposes a new method named CipherFormer, aiming to reduce the number of communication rounds and improve the efficiency and accuracy of private inference through homomorphic encryption (HE) and garbled circuit (GC) techniques. ### Problem Background With the wide application of Transformer models in fields such as natural language processing (NLP) and computer vision (CV), more and more users choose to outsource the inference tasks of these large models to cloud servers. However, this practice brings serious privacy threats because users' private data may be exposed or misused after being uploaded to the cloud. ### Limitations of Existing Solutions Although previous studies have attempted to provide private inference schemes for Transformer models, these schemes usually require a large number of communication rounds (hundreds of times), which limits their practical applications. In addition, although methods based on homomorphic encryption can achieve non - interactive multiplication and addition operations, their accuracy and efficiency are still low for complex operations such as the Softmax function. ### Innovations of CipherFormer To overcome the above challenges, CipherFormer proposes the following innovations: 1. **Efficient Ciphertext Matrix Multiplication**: - By introducing a new interaction protocol, complete ciphertext matrix multiplication within 2 rounds of communication, reducing communication overhead and noise growth. 2. **Custom - designed Garbled Circuit**: - For the Softmax function in the attention mechanism, a special garbled circuit is designed to simplify the calculation and improve efficiency. 3. **Optimization Strategies**: - Introduce a lightweight attention mechanism to reduce computational complexity and communication overhead. - Use the mixed - bit - width technology to further reduce the latency of non - linear function inference while hardly affecting the model accuracy. ### Experimental Results Experiments show that, compared with the existing advanced homomorphic encryption schemes, CipherFormer not only improves the accuracy by 3% - 11% in text classification tasks, but also achieves a speed - up of 7.7 - 11.9 times. ### Summary CipherFormer effectively solves the problems of excessive communication rounds and low efficiency in the private inference of Transformer models by combining homomorphic encryption and garbled circuit techniques, providing users with an efficient and accurate privacy protection scheme.

CipherFormer: Efficient Transformer Private Inference with Low Round Complexity

CHEETAH: An Ultra-Fast, Approximation-Free, and Privacy-Preserved Neural Network Framework based on Joint Obscure Linear and Nonlinear Computations

East: Efficient and Accurate Secure Transformer Framework for Inference

LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers

Primer: Fast Private Transformer Inference on Encrypted Data

$\textit{Comet:}$ A $\underline{Com}$munication-$\underline{e}$fficient and Performant Approxima$\underline{t}$ion for Private Transformer Inference

Efficient and Privacy-Preserving Tree-Based Inference via Additive Homomorphic Encryption

PPTIF: Privacy-Preserving Transformer Inference Framework for Language Translation

MLFormer: a high performance MPC linear inference framework for transformers

Secure Transformer Inference Protocol

A Survey on Private Transformer Inference

PrivCirNet: Efficient Private Inference via Block Circulant Transformation

CryptoGCN: Fast and Scalable Homomorphically Encrypted Graph Convolutional Network Inference

Optimized Privacy-Preserving CNN Inference With Fully Homomorphic Encryption

A Secure Convolutional Neural Network Inference Model Based on Homomorphic Encryption

Faster CryptoNets: Leveraging Sparsity for Real-World Encrypted Inference

Encryption-Friendly LLM Architecture

Falcon: Accelerating Homomorphically Encrypted Convolutions for Efficient Private Mobile Network Inference

Nimbus: Secure and Efficient Two-Party Inference for Transformers

Computation-efficient Deep Model Training for Ciphertext-based Cross-silo Federated Learning

Towards Fast and Scalable Private Inference