BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core System

Haopeng Huang,Yuyang Jin,Wei Xue
DOI: https://doi.org/10.1145/3673038.3673131
2024-01-01
Abstract:MPI neighborhood communication with sparse and imbalanced patterns is common in process-level parallel programs. However, these programs often encounter significant performance slowdowns in today’s many-core clusters that feature dozens of cores per node. There are two key causes for this slowdown. First, there is substantial competition for memory and network ports when a large number of processes simultaneously access the MPI library. Second, many neighborhood communications do not align well with the many-core architecture, resulting in performance bottlenecks that could have been mitigated. In this paper, we leverage communication patterns to address the above issues in neighborhood communication. We use zero redundant copy and message aggregation to optimize intra-node communication, and relieve both intra-node and inter-node bottlenecks with process mapping. By combining optimizations effectively, we present BoostN, a standalone library that speeds up imbalanced neighborhood communication on many-core systems. BoostN works well with mainstream homogeneous architectures and various latest versions of MPI libraries. Experiments show that our optimization tool can achieve up to 4.94x geometric mean speedups for SpMV of 2,708 matrices in SuiteSparse, up to 8.18x speedup for Laser problem (latency-bounded), and up to 8.98x speedup for Oil problem (bandwidth-bounded) solved by Hypre.
What problem does this paper attempt to address?