Ziheng Wang,Heng Chen,Xiaoshe Dong,Weilin Cai,Xingjun Zhang
DOI: https://doi.org/10.1016/j.future.2022.02.004
IF: 7.307
2022-01-01
Future Generation Computer Systems
Abstract:One-sided communication (also known as remote memory access, or RMA) in the Message Passing Interface (MPI) is a communication interface that has been introduced in MPI-2 (1997) that enables new more efficient programming models. In MPI-3, some new more flexible and efficient primitives have been introduced, which makes it easier to use and more deployable. However, compared to traditional two-sided communication, little work has been performed on analyzing one-sided communication costs, which urgently requires formal analysis. The communication performance model is a formal analysis of communication and its cost. We focus on the software model, whose core idea is that the transmission can be represented as a sequence of implicit transfers and data movements. This approach is a suitable solution for concurrent communication modeling. We propose LogSC , which consists of the window cost, transmission cost, synchronization cost, and computational cost in atomic operations. In this paper, LogSC is used to model most of the operations in one-sided communication, including the put / get operation, atomic operation, and shared memory programming of MPI. We model and evaluate the parallel tests of IMB, collectives designed by combining MPI and the MPI shared memory (MPI+MPI), and the communication in the scalable universal matrix multiplication algorithm (SUMMA), which is a common matrix multiplication algorithm. Experiments show that our modeling has high accuracy, which makes up for the lack of existing models. • We present a MPI performance model named LogSC for modeling one-sided communication. • The LogSC goal is to help optimize parallel applications and parallel algorithms. • LogSC can model most operation in RMA, including put/get atomic operations, and shared memory programming. • We provide the modeling of concurrent communication, including parallel algorithms and a MPI+MPI code.