Efficient Parallelization of a Two-List Algorithm for the Subset-Sum Problem on a Hybrid CPU/GPU Cluster

Letian Kang,Lanjun Wan,Kenli Li
DOI: https://doi.org/10.1109/PAAP.2014.44
2014-01-01
Abstract:Recently, hybrid CPU/GPU cluster has been widely used to deal with compute-intensive problems, such as the subset-sum problem. The two-list algorithm is a well known approach to solve the problem. However, a hybrid MPI-CUDA dual-level parallelization of the algorithm on the cluster is not straightforward. The key challenge is how to allocate the most suitable workload to each node to achieve good load balancing between nodes and minimize the communication overhead. Therefore, this paper proposes an effective workload distribution scheme which aims to reasonably assign workload to each node. According to this scheme, an efficient MPI-CUDA parallel implementation of a two-list algorithm is presented. A series of experiments are conducted to compare the performance of the hybrid MPI-CUDA implementation with that of the best sequential CPU implementation, the single-node CPU-only implementation, the single-node GPU-only implementation, and the hybrid MPI-OpenMP implementation with same cluster configuration. The results show that the proposed hybrid MPI-CUDA implementation not only offers significant performance benefits but also has excellent scalability.
What problem does this paper attempt to address?