Abstract:Polynomial multiplication is one of the fundamental operations in many applications, such as fully homomorphic encryption (FHE). However, the computational inefficiency stemming from polynomials with many large-bit coefficients poses a significant challenge for the practical implementation of FHE. The Number Theoretic Transform (NTT) has proven an effective tool in enhancing polynomial multiplication, but a fast and adaptable method for generating NTT accelerators is lacking. In this paper, we introduce HF-NTT, a novel NTT accelerator. HF-NTT efficiently handles polynomials of varying degrees and moduli, allowing for a balance between performance and hardware resources by adjusting the number of Processing Elements (PEs). Meanwhile, we introduce a data movement strategy that eliminates the need for bit-reversal operations, resolves different hazards, and reduces the clock cycles. Furthermore, Our accelerator includes a hardware-friendly modular multiplication design and a configurable PE capable of adapting its data path, resulting in a universal architecture. We synthesized and implemented prototype using Vivado 2022.2, and evaluated it on the Xilinx Virtex-7 FPGA platform. The results demonstrate significant improvements in Area-Time-Product (ATP) and processing speed for different polynomial degrees. In scenarios involving multi-modulus polynomial multiplication, our prototype consistently outperforms other designs in both ATP and latency metrics.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of low computational efficiency in fully homomorphic encryption (FHE) caused by polynomials having a large number of large - bit - width coefficients. Specifically: 1. **Computational Bottleneck**: - Polynomial multiplication is a basic operation in many applications, such as FHE. However, due to the large bit - width (hundreds of bits) of polynomial coefficients, the computational efficiency is low, which poses a significant challenge to the practical implementation of FHE. - Although the number - theoretic transform (NTT) has been proven to be an effective tool for improving the efficiency of polynomial multiplication, there is currently a lack of fast and adaptable methods to generate NTT accelerators. 2. **Hardware Resource Optimization**: - The paper proposes a new NTT accelerator - HF - NTT, which can efficiently process polynomials of different degrees and moduli. By adjusting the number of processing elements (PEs), it achieves a balance between performance and hardware resources. - HF - NTT introduces a data movement strategy that eliminates the need for bit - reversal operations, solves different types of hazard problems, and reduces the clock cycle. - The accelerator also includes a hardware - friendly modular multiplication design and configurable PEs, which can adjust their data paths according to task requirements, thus forming a general - purpose architecture. 3. **Limitations of Existing Methods**: - Existing hardware designs have limitations when processing polynomials with large - bit - width coefficients. For example, the fixed polynomial degree and the unchangeable limited number of PEs limit their applicability. - As the design complexity increases, hardware solutions face challenges in performance and area efficiency when dealing with larger moduli and ciphertext lengths, and the implementation of control logic also becomes more difficult. - Memory overhead is also a challenge. For example, Du et al. use a ping - pong mechanism to construct memory, resulting in complex control logic and double the storage area. ### Solutions To solve these problems, the paper proposes HF - NTT, an innovative NTT accelerator. The main contributions include: 1. **New Data Storage and Movement Methods**: - Eliminate memory conflicts between PEs and memory in accelerators with different numbers of PEs and stalls in PEs. 2. **Use the Residue Number System (RNS) to Decompose Large - Bit - Width Coefficients**: - Enhance the adaptability and resource utilization of the hardware platform. 3. **Introduce Configurable Butterfly Units (CBUs)**: - Can perform multiple operations, including NTT, multiplication, and INTT, and optimize the multiplier through the Barrett algorithm and DSP to improve the performance - area efficiency on FPGA. 4. **High - Efficiency Implementation**: - Implement a high - performance and area - efficient HF - NTT accelerator on the Xilinx VIRTEX - 7 FPGA platform, and demonstrate its adaptability to large - scale designs through experiments and design parameter exploration. Through these improvements, HF - NTT significantly improves the computational efficiency of polynomial multiplication in FHE, reduces hardware resource consumption, and improves the flexibility and adaptability of the system.

HF-NTT: Hazard-Free Dataflow Accelerator for Number Theoretic Transform

Implementation of elliptic curve cryptography accelerator

A Low Latency High Throughput Multiply-accumulator Unit for Float Point and Integer

SAM: A Scalable Accelerator for Number Theoretic Transform Using Multi-Dimensional Decomposition

An Area-Efficient and Configurable Number Theoretic Transform Accelerator for Homomorphic Encryption

An efficient hardware accelerator for NTT-based polynomial multiplication using FPGA

Area-Efficient Number Theoretic Transform Architecture for Homomorphic Encryption

HMNTT: A Highly Efficient MDC-NTT Architecture for Privacy-preserving Applications

High-Speed NTT Accelerator for CRYSTAL-Kyber and CRYSTAL-Dilithium

A Number Theoretic Transform Accelerator with Two Parallel Simplified Butterfly Units

CFNTT: Scalable Radix-2/4 NTT Multiplication Architecture with an Efficient Conflict-free Memory Mapping Scheme.

A High Speed NTT Accelerator for Lattice-Based Cryptography

A High-Speed NTT-Based Polynomial Multiplication Accelerator with Vector Extension of RISC-V for Saber Algorithm

NTTU: an Area-Efficient Low-Power NTT-Uncoupled Architecture for NTT-Based Multiplication

Design of a Fast Number Theoretical Transform Engine for Fully Homomorphic Encryption.

Scalable and Parallel Optimization of the Number Theoretic Transform Based on FPGA

Trinity: A General Purpose FHE Accelerator

Taiyi: A high-performance CKKS accelerator for Practical Fully Homomorphic Encryption

Parallel Accelerating Number Theoretic Transform for Bootstrapping on a Graphics Processing Unit

PipeNTT: A Pipelined Number Theoretic Transform Architecture

Unif-NTT: A Unified Hardware Design of Forward and Inverse NTT for PQC Algorithms