High-Speed NTT Accelerator for CRYSTAL-Kyber and CRYSTAL-Dilithium

Trong-Hung Nguyen,Binh Kieu-Do-Nguyen,Cong-Kha Pham,Trong-Thuc Hoang
DOI: https://doi.org/10.1109/access.2024.3371581
IF: 3.9
2024-03-12
IEEE Access
Abstract:The efficiency of polynomial multiplication execution majorly impacts the performance of lattice-based post-quantum cryptosystems. In this research, we propose a high-speed hardware architecture to accelerate polynomial multiplication based on the Number Theoretic Transform (NTT) in CRYSTAL-Kyber and CRYSTAL-Dilithium. We design a Digital Signal Processing (DSP) architecture for modular multiplication in butterfly and Point-Wise Multiplication (PWM) operations. Our method reduces the critical path delay of an -bit multiplier to that of a ( -2)-bit adder, optimizing both area and speed. These dedicated DSPs are employed in butterfly and PWM operations, completely eliminating the pre-process and post-process of NTT transforms. Furthermore, we introduce a novel unified pipelined architecture for the NTT and Inverse NTT (INTT) transformations of Kyber and Dilithium, with corresponding high-speed (Radix-2) and ultra-high-speed (Radix-4) versions. Lastly, we construct a complete hardware accelerator for polynomial matrix-vector multiplication in Kyber. The Field-Programmable Gate Array (FPGA) implementation results have proven that our designs have significantly improved execution time by – for the NTT transforms in Dilithium and – for Kyber polynomial multiplication, compared to previous studies reported to date. Additionally, the hardware footprint results indicate that our proposed architectures exhibit superior hardware performance in Area-Time-Product (ATP), corresponding to a 44%–96% improvement. The proposed architectures are efficient and well-suited for high-performance lattice-based cryptography systems.
computer science, information systems,telecommunications,engineering, electrical & electronic
What problem does this paper attempt to address?