Extending $$\tau $$-Lop to Model MPI Blocking Primitives on Shared Memory

Wang Ziheng,Chen Heng,Dong Xiaoshe,Cai Weilin,Kang Yan,Zhang Xingjun
DOI: https://doi.org/10.1007/s11227-022-04352-3
IF: 3.3
2022-01-01
The Journal of Supercomputing
Abstract:MPI communication optimization is essential for high-performance applications. The communication performance models have made some achievements in improving the efficiency of collective algorithms and optimizing communication scheduling. Instead of using hardware-related parameters such as bandwidth and latency for communication modeling, recent studies have focused more on software models, which simplify modeling by representing transmission as a sequence of implicit transfers. As a state-of-the-art software model, $$\tau $$ -Lop adopts the concept of concurrent transfers for modeling on multiple platforms. However, $$\tau $$ -Lop only focuses on the entire system, not the single MPI primitive. This makes $$\tau $$ -Lop difficult to apply in systems where processes have different cost. The demand for high-precision concurrent communication modeling is increasing, thus, we extend $$\tau $$ -Lop to model MPI primitives to handle this situation and model more, such as asynchronous communication. The modeling accuracy is improved after considering factors such as concurrent transmission, waiting time, communication ends, channels, and protocols. In the test of point-to-point and concurrent communication, the relative error of our model is less than 40% and the accuracy is more than 100% higher than the original $$\tau $$ -Lop model in most cases, which means that our work can be used for practical optimization.
What problem does this paper attempt to address?