The implementation and optimization of Bitonic sort algorithm based on CUDA

Qi Mu,Liqing Cui,Yufei Song
DOI: https://doi.org/10.48550/arXiv.1506.01446
2015-06-04
Abstract:This paper describes in detail the bitonic sort algorithm,and implements the bitonic sort algorithm based on cuda <a class="link-external link-http" href="http://architecture.At" rel="external noopener nofollow">this http URL</a> the same time,we conduct two effective optimization of implementation details according to the characteristics of the GPU,which greatly improve the efficiency. Finally,we survey the optimized Bitonic sort algorithm on the GPU with the speedup of quick sort algorithm on the <a class="link-external link-http" href="http://CPU.Since" rel="external noopener nofollow">this http URL</a> Quick Sort is not suitable to be implemented in parallel,but it is more efficient than other sorting algorithms on CPU to some <a class="link-external link-http" href="http://extend.Hence" rel="external noopener nofollow">this http URL</a>,to see the speedup and performance,we compare bitonic sort on GPU with quick Sort on CPU. For a series of 32-bit random integer,the experimental results show that the acceleration of our work is nearly 20 <a class="link-external link-http" href="http://times.When" rel="external noopener nofollow">this http URL</a> array size is about 216,the speedup ratio is even up to 30.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?