Fast Hardware Implementation for Extended GCD of Large Numbers in Redundant Representation

Lun Ou,Danyang Zhu,Jing Tian,Zhongfeng Wang
DOI: https://doi.org/10.1109/tcsii.2023.3253712
2023-01-01
IEEE Transactions on Circuits & Systems II Express Briefs
Abstract:The extended greatest common divisor (XGCD) of large numbers is a commonly used and computing-intensive operation in cryptography. The growing demands for high-speed cryptography applications urge for fast XGCD implementation. However, it is challenging to design fast architectures for XGCD because of its complex operations and tight data dependency. In this brief, we propose a low-latency and high-efficiency architecture for XGCD by using a modified extended $k$ -ary algorithm in redundant data representation. We modify the original $k$ -ary algorithm by adopting a parameter $\delta $ to reduce computing delay and applying a hardware-friendly method to Bézout coefficients. The redundant signed digit (RSD) representation is selected to avoid carry propagation and achieve high clock frequency. We devise high-parallel and low-latency architectures for the proposed algorithms with $k=2, 4$ , and 8. The designs are coded in SystemVerilog and synthesized using TSMC 28-nm CMOS technology. The implementation results show that for the data bit width of 1024, the design with $k=8$ achieves the shortest latency among different values of $k$ , which is about 1.5x faster than the prior art and only needs $472~ns$ .
What problem does this paper attempt to address?