Novel Systolic Implementation of Modular Multiplication for Large Operands

CHEN Hongyi,GAI Weixin
DOI: https://doi.org/10.3321/j.issn:1000-0054.1998.03.003
1998-01-01
Abstract:A novel systolic linear array modular multiplier is presented which ideally performs the parallel modular multiplication based on the algorithm of Montgomery. The total execution time for an n bit modular multiplication is 2n+11 clock cycles. To further increase the throughput the three stage pipeline architecture is adopted inside the processing element, so that every one bit result outputs at one clock cycle when the pipeline is filled. Each pipeline stage only contains the operation of an one bit full adder. Moreover, with the purely nearest neighbor communication, the interconnect delay is also very short. Therefore it can work at a high clock frequency. On the other hand, every processing element is simple, mainly consisting of four full adders and fourteen flip flops. For n bit modular multiplication, the cost of the hardware is 46 n +184 gates. So this novel linear systolic array for modular multiplication is a speed and area optimized system, suitable for the VLSI implementation. It can be used for modular exponentiation which is a kernel operation in many public key cryptosystems such as RSA. With clock frequency of 200MHz by using 0.8μm CMOS technology, the throughput can reach 129kb/s with a single modular multiplier chip.
What problem does this paper attempt to address?