A High-Throughput and Scalable Schoolbook Polynomial Multiplier for Accelerating Saber on FPGA Using a Novel Winograd-Based Architecture

Jianfei Wang,Chen Yang,Fahong Zhang,Jia Hou,Yishuo Meng,Siwei Xiang,Yang Su
DOI: https://doi.org/10.1109/tcsii.2023.3339566
2023-01-01
IEEE Transactions on Circuits & Systems II Express Briefs
Abstract:Polynomial multiplication is a significant bottleneck for Saber. To speed it up, number theoretic transform (NTT), Toom-Cook and Schoolbook polynomial multiplication (SPM) are commonly used algorithms. Among them, SPM has no restrictions and is more suitable for high parallel architecture. However, superior performance and compatibility across different devices are not simultaneously available in the current work. Therefore, we propose a highly parallel and scalable Winograd-based SPM algorithm that reduces the number of loops and achieves a 32.18% reduction in multiplications. A high-throughput and scalable Schoolbook polynomial multiplier, named HSWM, is proposed to speed up the polynomial multiplication of Saber. Benefiting from the proposed algorithm and highly parallel pipelined hardware architecture, HSWM supports multiple parallelisms, enabling flexible scalability. Experimental results show that HSWM with eight parallel cores (HSWM-8) on UltraScale+ platform can perform a polynomial multiplication every 0.0336 μs at a 476 MHz clock frequency, and achieves a 4.58× ~451.40× and 1.18× ~9.67× increase in throughput and throughput per slice, and achieves a decrease of 15.60% ~ 89.69% in area-time complexity of slice, compared to the current SPM-based designs.
What problem does this paper attempt to address?