Abstract:The C++ programming language and its cousins lean towards a memory-inefficient storage of structs: The compiler inserts helper bits into the struct such that individual attributes align with bytes, and it adds additional bytes aligning attributes with cache lines, while it is not able to exploit knowledge about the range of integers, enums or bitsets to bring the memory footprint down. Furthermore, the language provides neither support for data exchange via MPI nor for arbitrary floating-point precision formats. If developers need to have a low memory footprint and MPI datatypes over structs which exchange only minimal data, they have to manipulate the data and to write MPI datatypes manually. We propose a C++ language extension based upon C++ attributes through which developers can guide the compiler what memory arrangements would be beneficial: Can multiple booleans be squeezed into one bit field, do floats hold fewer significant bits than in the IEEE standard, or does the code require a user-defined MPI datatype for certain subsets of attributes? The extension offers the opportunity to fall back to normal alignment and padding rules via plain C++ assignments, no dependencies upon external libraries are introduced, and the resulting code remains standard C++. Our work implements the language annotations within LLVM and demonstrates their potential impact, both upon the runtime and the memory footprint, through smoothed particle hydrodynamics (SPH) benchmarks. They uncover the potential gains in terms of performance and development productivity.

What problem does this paper attempt to address?

### What problems does this paper attempt to solve? This paper aims to solve some deficiencies of the C++ language in high - performance computing (HPC), especially those related to memory usage efficiency and MPI (Message Passing Interface) support. Specifically, the paper focuses on the following aspects: 1. **Large memory footprint**: - When processing structures (struct), the C++ compiler inserts padding bytes to ensure memory alignment of member variables, which leads to unnecessary memory waste. For example, a boolean value only needs 1 bit to represent, but is usually allocated 8 bits (one byte), thus reducing the information density. - The ranges of enumeration types (enum) and integer types are not fully utilized, resulting in increased memory footprint. 2. **Insufficient data precision control**: - The C++ language lacks support for "continuous" data precision, that is, it cannot flexibly specify the number of significant digits of floating - point numbers. Developers cannot express the actual precision requirements for integers or floating - point numbers, which affects memory footprint and performance. 3. **Imperfect MPI support**: - The C++ language itself has no built - in support for distributed - memory parallelization via MPI. Developers need to manually convert structure members to MPI data types, a process that is error - prone and time - consuming. In addition, different subsets of different structure members require different MPI views, which further increases the complexity. ### Solutions To solve the above problems, the paper proposes the following solutions: 1. **Introduce new C++ annotations**: - Developers can use these annotations to guide the compiler on how to optimize the memory layout. For example, multiple boolean values can be packed into a bit field, reducing memory footprint. - For integers and floating - point numbers, developers can specify their effective range or precision, enabling the compiler to compress these data more efficiently. 2. **Improve MPI support**: - Provide explicit modeling support for MPI data types, allowing developers to define different MPI views for different subsets of structures, simplifying the process of cross - node data exchange. 3. **Maintain code compatibility**: - These extensions are optional and will not break the behavior of existing code. If the compiler does not support these annotations, they will be ignored and the code can still run normally. ### Experimental verification The paper demonstrates the potential impact of these extensions through the smoothed particle hydrodynamics (SPH) benchmark test, proving their effectiveness in reducing memory footprint, improving performance, and development efficiency. In summary, this paper solves the problems of low memory usage efficiency and insufficient MPI support in C++ for high - performance computing by introducing new C++ language extensions, thereby improving the design quality and performance of scientific computing software.

An extension of C++ with memory-centric specifications for HPC to reduce memory footprints and streamline MPI development

MPIs Language Bindings are Holding MPI Back

mdspan in C++: A Case Study in the Integration of Performance Portable Features into International Language Standards

Compiler support for semi-manual AoS-to-SoA conversions with data views

Compiler-directed scratchpad memory data transfer optimization for multithreaded applications on a heterogeneous many-core architecture

Extending the C/C++ Memory Model with Inline Assembly

Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided

KaMPIng: Flexible and (Near) Zero-Overhead C++ Bindings for MPI

LLAMA: The Low-Level Abstraction For Memory Access

Code modernization strategies for short-range non-bonded molecular dynamics simulations

Zweilous: A Decoupled and Flexible Memory Management Framework.

Metall: A Persistent Memory Allocator For Data-Centric Analytics

Designing and prototyping extensions to the Message Passing Interface in MPICH

Mesh: Compacting Memory Management for C/C++ Applications

Language Support for Reliable Memory Regions

Separation of concerning things: a simpler basis for defining and programming with the C/C++ memory model (extended version)

Design and Implementation of ShenWei Universal C/C++

Exploring Multi-Reader Buffers and Channel Placement During Dataflow Network Mapping to Heterogeneous Many-Core Systems

A MPI Parallel Programming Model for CBEA Based on Hybrid Memory Access Technology

A Quantitative Approach for Adopting Disaggregated Memory in HPC Systems