HF-NTT: Hazard-Free Dataflow Accelerator for Number Theoretic Transform

Xiangchen Meng,Zijun Jiang,Yangdi Lyu
2024-10-07
Abstract:Polynomial multiplication is one of the fundamental operations in many applications, such as fully homomorphic encryption (FHE). However, the computational inefficiency stemming from polynomials with many large-bit coefficients poses a significant challenge for the practical implementation of FHE. The Number Theoretic Transform (NTT) has proven an effective tool in enhancing polynomial multiplication, but a fast and adaptable method for generating NTT accelerators is lacking. In this paper, we introduce HF-NTT, a novel NTT accelerator. HF-NTT efficiently handles polynomials of varying degrees and moduli, allowing for a balance between performance and hardware resources by adjusting the number of Processing Elements (PEs). Meanwhile, we introduce a data movement strategy that eliminates the need for bit-reversal operations, resolves different hazards, and reduces the clock cycles. Furthermore, Our accelerator includes a hardware-friendly modular multiplication design and a configurable PE capable of adapting its data path, resulting in a universal architecture. We synthesized and implemented prototype using Vivado 2022.2, and evaluated it on the Xilinx Virtex-7 FPGA platform. The results demonstrate significant improvements in Area-Time-Product (ATP) and processing speed for different polynomial degrees. In scenarios involving multi-modulus polynomial multiplication, our prototype consistently outperforms other designs in both ATP and latency metrics.
Hardware Architecture,Cryptography and Security
What problem does this paper attempt to address?
### What problem does this paper attempt to solve? This paper aims to solve the problem of low computational efficiency in fully homomorphic encryption (FHE) caused by polynomials having a large number of large - bit - width coefficients. Specifically: 1. **Computational Bottleneck**: - Polynomial multiplication is a basic operation in many applications, such as FHE. However, due to the large bit - width (hundreds of bits) of polynomial coefficients, the computational efficiency is low, which poses a significant challenge to the practical implementation of FHE. - Although the number - theoretic transform (NTT) has been proven to be an effective tool for improving the efficiency of polynomial multiplication, there is currently a lack of fast and adaptable methods to generate NTT accelerators. 2. **Hardware Resource Optimization**: - The paper proposes a new NTT accelerator - HF - NTT, which can efficiently process polynomials of different degrees and moduli. By adjusting the number of processing elements (PEs), it achieves a balance between performance and hardware resources. - HF - NTT introduces a data movement strategy that eliminates the need for bit - reversal operations, solves different types of hazard problems, and reduces the clock cycle. - The accelerator also includes a hardware - friendly modular multiplication design and configurable PEs, which can adjust their data paths according to task requirements, thus forming a general - purpose architecture. 3. **Limitations of Existing Methods**: - Existing hardware designs have limitations when processing polynomials with large - bit - width coefficients. For example, the fixed polynomial degree and the unchangeable limited number of PEs limit their applicability. - As the design complexity increases, hardware solutions face challenges in performance and area efficiency when dealing with larger moduli and ciphertext lengths, and the implementation of control logic also becomes more difficult. - Memory overhead is also a challenge. For example, Du et al. use a ping - pong mechanism to construct memory, resulting in complex control logic and double the storage area. ### Solutions To solve these problems, the paper proposes HF - NTT, an innovative NTT accelerator. The main contributions include: 1. **New Data Storage and Movement Methods**: - Eliminate memory conflicts between PEs and memory in accelerators with different numbers of PEs and stalls in PEs. 2. **Use the Residue Number System (RNS) to Decompose Large - Bit - Width Coefficients**: - Enhance the adaptability and resource utilization of the hardware platform. 3. **Introduce Configurable Butterfly Units (CBUs)**: - Can perform multiple operations, including NTT, multiplication, and INTT, and optimize the multiplier through the Barrett algorithm and DSP to improve the performance - area efficiency on FPGA. 4. **High - Efficiency Implementation**: - Implement a high - performance and area - efficient HF - NTT accelerator on the Xilinx VIRTEX - 7 FPGA platform, and demonstrate its adaptability to large - scale designs through experiments and design parameter exploration. Through these improvements, HF - NTT significantly improves the computational efficiency of polynomial multiplication in FHE, reduces hardware resource consumption, and improves the flexibility and adaptability of the system.