A Parallel Quicksort Algorithm on Manycore Processors in Sunway TaihuLight

Siyuan Ren,Shizhen Xu,Guangwen Yang
DOI: https://doi.org/10.1007/978-3-319-93713-7_61
2018-01-01
Abstract:In this paper we present a highly efficient parallel quicksort algorithm on SW26010, a heterogeneous manycore processor that makes Sunway TaihuLight the Top-One supercomputer in the world. Motivated by the software-cache and on-chip communication design of SW26010, we propose a two-phase quicksort algorithm, with the first counting elements and the second moving elements. To make the best of such manycore architecture, we design a decentralized workflow, further optimize the memory access and balance the workload. Experiments show that our algorithm scales efficiently to 64 cores of SW26010, achieving more than 32X speedup for int32 elements on all kinds of data distributions. The result outperforms the strong scaling one of Intel TBB (Threading Building Blocks) version of quicksort on x86-64 architecture.
What problem does this paper attempt to address?