Frustrated with MPI+Threads? Try MPIxThreads!

Hui Zhou,Ken Raffenetti,Junchao Zhang,Yanfei Guo,Rajeev Thakur
DOI: https://doi.org/10.1145/3615318.3615320
2024-01-30
Abstract:MPI+Threads, embodied by the MPI/OpenMP hybrid programming model, is a parallel programming paradigm where threads are used for on-node shared-memory parallelization and MPI is used for multi-node distributed-memory parallelization. OpenMP provides an incremental approach to parallelize code, while MPI, with its isolated address space and explicit messaging API, affords straightforward paths to obtain good parallel performance. However, MPI+Threads is not an ideal solution. Since MPI is unaware of the thread context, it cannot be used for interthread communication. This results in duplicated efforts to create separate and sometimes nested solutions for similar parallel tasks. In addition, because the MPI library is required to obey message-ordering semantics, mixing threads and MPI via MPI_THREAD_MULTIPLE can easily result in miserable performance due to accidental serializations.
Distributed, Parallel, and Cluster Computing
What problem does this paper attempt to address?
### Problems Addressed by the Paper This paper aims to address the issues present in the current MPI+Threads programming model and proposes a new extension—MPI×Threads (i.e., the product of MPI and threads). #### Issues with the Current MPI+Threads Model 1. **Communication Limitation**: In the MPI+Threads model, MPI cannot be directly used for inter-thread communication because MPI is unaware of the thread context. 2. **Code Duplication**: Since MPI and OpenMP handle different tasks separately, it leads to the need to write duplicate code to achieve similar functionalities. 3. **Performance Issues**: When using MPI_THREAD_MULTIPLE, severe performance bottlenecks may occur due to the need to maintain message order. #### Newly Proposed Solution The paper proposes a new MPI extension—`MPIX Thread Communicator` (abbreviated as `threadcomm`), which allows threads to be assigned unique MPI ranks within a shared memory region, thereby enabling inter-thread communication. In this way, OpenMP and MPI can work together, simplifying the code and improving performance. #### Main Contributions 1. **Unified Environment**: Combines MPI processes and OpenMP threads into a unified parallel environment. 2. **New API**: Introduces new APIs, such as `MPIX_Threadcomm_init`, `MPIX_Threadcomm_start`, etc., allowing threads to directly use MPI for communication within parallel regions. 3. **Performance Optimization**: Presents preliminary performance results showing that the new method can match or even exceed the performance of pure MPI or pure OpenMP in certain cases. In summary, this paper aims to overcome the shortcomings of the current MPI+Threads model by introducing a new MPI extension, providing a more efficient and concise parallel programming approach.