Accelerating MPI Collectives with Process-in-Process-based Multi-object Techniques
Jiajun Huang,Kaiming Ouyang,Yujia Zhai,Jinyang Liu,Min Si,Kenneth Raffenetti,Hui Zhou,A. Hori,Zizhong Chen,Yan-Hua Guo,R. Thakur
DOI: https://doi.org/10.1145/3588195.3595955
2023-05-17
Abstract:In the exascale computing era, optimizing MPI collective performance in high-performance computing (HPC) applications is critical. Current algorithms face performance degradation due to system call overhead, page faults, or data-copy latency, affecting HPC applications' efficiency and scalability. To address these issues, we propose PiP-MColl, a Process-in-Process-based Multi-object Inter-process MPI Collective design that maximizes small message MPI collective performance at scale. PiP-MColl features efficient multiple sender and receiver collective algorithms and leverages Process-in-Process shared memory techniques to eliminate unnecessary system call, page fault overhead, and extra data copy, improving intra- and inter-node message rate and throughput. Our design also boosts performance for larger messages, resulting in comprehensive improvement for various message sizes. Experimental results show that PiP-MColl outperforms popular MPI libraries, including OpenMPI, MVAPICH2, and Intel MPI, by up to 4.6X for MPI collectives like MPI_Scatter and MPI_Allgather.
Computer Science