CMLB: a Communication-aware and Memory Load Balance Mapping Optimization for Modern NUMA Systems

Jingbo Li,Yuxin Zhang,Xingjun Zhang
DOI: https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00099
2021-01-01
Abstract:For parallel applications, mapping parallel threads to cores according to the access behavior plays an important role to optimize the applications performance. The imbalance between thread communication and memory bandwidth will severely increase the average latency and the execution time of the application when running on modern nonuniform memory access (NUMA) architecture. Previous studies on thread mapping mostly focus on the locality of memory accesses to improve the communication efficiency. However, maximizing the locality may cause memory congestion because of the imbalance on memory bandwidth between nodes. In this paper, a communication-aware and memory load balance mapping algorithm (CMLB) for modern NUMA systems is propose which works on improving the locality of communication as well as avoiding memory congestion problem. To verify the effectiveness of the algorithm, the applications from NAS parallel benchmark and Parsec benchmark are used. Experimental results show that CMLB could greatly balance the memory bandwidth between nodes to reduce the memory latency and also improve the locality of communication, get the better performance than the state-of-the-art mapping methods.
What problem does this paper attempt to address?