Abstract:Homomorphic encryption is one of the main solutions for building secure and privacy-preserving solutions for Machine Learning as a Service. This motivates the development of homomorphic algorithms for the main building blocks of AI, typically for the components of the various types of neural networks architectures. Among those components, we focus on the Softmax function, defined by $\mathrm{SM}(\mathbf{x}) = \left(\exp(x_i) / \sum_{j=1}^n \exp(x_j) \right)_{1\le i\le n}$. This function is deemed to be one of the most difficult to evaluate homomorphically, because of its multivariate nature and of the very large range of values for $\exp(x_i)$. The available homomorphic algorithms remain restricted, especially in large dimensions, while important applications such as Large Language Models (LLM) require computing Softmax over large dimensional vectors. In terms of multiplicative depth of the computation (a suitable measure of cost for homomorphic algorithms), our algorithm achieves $O(\log n)$ complexity for a fixed range of inputs, where $n$ is the Softmax dimension. Our algorithm is especially adapted to the situation where we must compute many Softmax at the same time, for instance, in the LLM situation. In that case, assuming that all Softmax calls are packed into $m$ ciphtertexts, the asymptotic amortized multiplicative depth cost per ciphertext is, again over a fixed range, $O(1 + m/N)$ for $N$ the homomorphic ring degree. The main ingredient of our algorithms is a normalize-and-square strategy, which interlaces the exponential computation over a large range and normalization, decomposing both in stabler and cheaper smaller steps. Comparing ourselves to the state of the art, our experiments show, in practice, a good accuracy and a gain of a factor 2.5 to 8 compared to state of the art solutions.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of efficiently and accurately computing the Softmax function in the Homomorphic Encryption (HE) environment. Specifically, the Softmax function is used in neural networks to convert outputs into probability distributions, but its computation involves exponential functions and normalization steps, which are very difficult in the HE environment, especially when the dimension of the input vector is large. #### Main challenges: 1. **Numerical stability**: The exponential operation in the Softmax function may lead to numerical overflow or underflow, especially in fixed - point arithmetic. 2. **Difficulty in normalization**: In the HE environment, normalization (i.e., adding all components and taking the inverse) is a very expensive operation, unless there is good control over the range of the normalization factor. 3. **High - dimensional computation**: For applications such as large - language models (LLM), Softmax computations for high - dimensional vectors need to be processed, which poses a huge challenge to existing HE algorithms. #### Solutions: The authors propose a "normalize - and - square" strategy to solve the above challenges through the following methods: - **Step - by - step squaring and normalization**: Decompose the Softmax computation into multiple smaller and more stable steps, thereby avoiding direct computation of exponential functions over a large range and complex normalization operations. - **Iterative algorithm**: Utilize the iterative formula $ \text{Softmax}(x/2^j)_i=\frac{\text{Softmax}(x/2^{j - 1})_i^2}{\sum_{t = 1}^n\text{Softmax}(x/2^{j - 1})_t^2} $, gradually approaching the final result. - **Parallel computation**: Utilize the SIMD characteristics of the CKKS homomorphic encryption system to compute multiple Softmax functions simultaneously in multiple ciphertexts, thereby improving efficiency. #### Experimental results: - **Accuracy**: Experiments show that this algorithm can achieve an accuracy of about 16 bits (in the worst case) and an average accuracy of about 20 bits. - **Efficiency**: Compared with the best existing solutions, this algorithm is 2.5 to 8 times faster in practical applications, especially when multiple Softmax functions need to be computed in parallel. - **Scalability**: This algorithm can handle 8192 Softmax computations with a dimension of 256 on a single - threaded CPU and can handle Softmax computations in large - language models (such as LLaMa) on a GPU, demonstrating good scalability. In conclusion, this paper significantly improves the efficiency and accuracy of Softmax computations in the homomorphic encryption environment through innovative algorithm design, providing strong support for Privacy - Preserving Machine Learning (PPML).

Fast and Accurate Homomorphic Softmax Evaluation

A Multi-Layer Parallel Hardware Architecture for Homomorphic Computation in Machine Learning

Hardware-Efficient SoftMax Architecture With Bit-Wise Exponentiation and Reciprocal Calculation

A Simple Solution for Homomorphic Evaluation on Large Intervals

Power-Softmax: Towards Secure LLM Inference over Encrypted Data

Accurate Low-Degree Polynomial Approximation of Non-polynomial Operators for Fast Private Inference in Homomorphic Encryption

Asymptotically Faster Multi-Key Homomorphic Encryption from Homomorphic Gadget Decomposition

Optimized Privacy-Preserving CNN Inference With Fully Homomorphic Encryption

Highly Accurate CNN Inference Using Approximate Activation Functions over Homomorphic Encryption

Hyft: A Reconfigurable Softmax Accelerator with Hybrid Numeric Format for both Training and Inference

Self-learning activation functions to increase accuracy of privacy-preserving Convolutional Neural Networks with homomorphic encryption

Online normalizer calculation for softmax

Efficient Hardware Architecture of Softmax Layer in Deep Neural Network

ConSmax: Hardware-Friendly Alternative Softmax with Learnable Parameters

GPU Accelerated Full Homomorphic Encryption Cryptosystem, Library and Applications for IoT Systems

MultiMax: Sparse and Multi-Modal Attention Learning

Towards the AlexNet Moment for Homomorphic Encryption: HCNN, theFirst Homomorphic CNN on Encrypted Data with GPUs

Homomorphic WiSARDs: Efficient Weightless Neural Network training over encrypted data

SoftmAP: Software-Hardware Co-design for Integer-Only Softmax on Associative Processors

Privacy-Preserving Machine Learning With Fully Homomorphic Encryption for Deep Neural Network

SHE: A Fast and Accurate Deep Neural Network for Encrypted Data