Radix-4 CORDIC algorithm based low-latency and hardware efficient VLSI architecture for Nth root and Nth power computations

Ankur Changela,Yogesh Kumar,Marcin Woźniak,Jana Shafi,Muhammad Fazal Ijaz
DOI: https://doi.org/10.1038/s41598-023-47890-3
2023-11-27
Abstract:In this article, a low-complexity VLSI architecture based on a radix-4 hyperbolic COordinate Rotion DIgital Computer (CORDIC) is proposed to compute the [Formula: see text] root and [Formula: see text] power of a fixed-point number. The most recent techniques use the radix-2 CORDIC algorithm to compute the root and power. The high computation latency of radix-2 CORDIC is the primary concern for the designers. [Formula: see text] root and [Formula: see text] power computations are divided into three phases, and each phase is performed by a different class of the proposed modified radix-4 CORDIC algorithms in the proposed architecture. Although radix-4 CORDIC can converge faster with fewer recurrences, it demands more hardware resources and computational steps due to its intricate angle selection logic and variable scale factor. We have employed the modified radix-4 hyperbolic vectoring (R4HV) CORDIC to compute logarithms, radix-4 linear vectoring (R4LV) to perform division, and the modified scaling-free radix-4 hyperbolic rotation (R4HR) CORDIC to compute exponential. The criteria to select the amount of rotation in R4HV CORDIC is complicated and depends on the coordinates [Formula: see text] and [Formula: see text] of the rotating vector. In the proposed modified R4HV CORDIC, we have derived the simple selection criteria based on the fact that the inputs to R4HV CORDIC are related. The proposed criteria only depend on the coordinate [Formula: see text] that reduces the hardware complexity of the R4HV CORDIC. The R4HR CORDIC shows the complex scale factor, and compensation of such scale factor necessitates the complex hardware. The complexity of R4HR CORDIC is reduced by pre-computing the scale factor for initial iterations and by employing scaling-free rotations for later iterations. Quantitative hardware analysis suggests better hardware utilization than the recent approaches. The proposed architecture is implemented on a Virtex-6 FPGA, and FPGA implementation demonstrates [Formula: see text] less hardware utilization with better error performance than the approach with the radix-2 CORDIC algorithm.
What problem does this paper attempt to address?