Low-Latency Architecture for the Parallel Extended GCD Algorithm of Large Numbers

Danyang Zhu,Jing Tian,Zhongfeng Wang
DOI: https://doi.org/10.1109/iscas51556.2021.9401063
2021-01-01
Abstract:The extended Greatest Common Divisor (GCD) is an extension of the GCD operation, which computes not only the GCD of integers a and b but also the Bezout's coefficients that are integers x and y such that ax + by =3D GCD(a,b). Recently, the large-number extended GCD algorithm is used in the core function of the next-generation blockchain systems and served as the most time-consuming operation. Considering the efficiency, speeding up this operation is urgently desired. However, the extended GCD, which is rarely explored in literature, is extremely hard to parallelize because of long serial operations with strong data dependency. In this paper, we propose a low- latency architecture for the extended GCD of large numbers by utilizing many algorithmic transformations and architectural optimizations. Firstly, a parallel extended GCD algorithm is well studied and modified to be practical in hardware. Secondly, a high-parallel architecture is designed for the selected extended GCD, where the trade-off is well evaluated between computation latency and power consumption. Finally, the architecture is coded using Verilog language and synthesized under the TSMC 28- nm CMOS technology. The experimental results for the 1024-bit extended GCD show that our design significantly outperforms the prior arts.
What problem does this paper attempt to address?