Fast Radix: A Scalable Hardware Accelerator for Parallel Radix Sort

Xingyu Liu,Yangdong Deng
DOI: https://doi.org/10.1109/fit.2014.48
2014-01-01
Abstract:Sorting is one of the most fundamental algorithms of computer science and is the bottleneck of many computing problems. Pursuing fast and stable sorting of large scale has been the optimization goal of many applications. CPU-only based software methods often failed to tap the potential of computing-intensive sorting algorithms with parallel traits. It inspires us to use specific hardware accelerator to exploit the parallel potential of algorithm and expedite the sorting process. Among successful sorting algorithms proposed during the past few decades, the parallel version of radix sort is best suitable or hardware acceleration. In this paper, we propose Fast Radix, a scalable hardware accelerator for 32-bit integer radix sort. The proposed accelerator is integrated into CPU micro architecture as a special functional unit and can be started with a special instruction. Upon being launched, it interacts directly with the MMU and DTLB of CPU. The accelerator has a queued-pipelined structure to fully exploit the bandwidth of memory. The accelerator was evaluated with FPGA Architecture Model Execution simulator, RAMP Gold. Experimental results proved that our radix sort accelerator outperformed its CPU software equivalent by a factor of over 5.
What problem does this paper attempt to address?