Programming Bare-Metal Accelerators with Heterogeneous Threading Models: A Case Study of Matrix-3000
Jianbin Fang,Peng Zhang,Chun Huang,Tao Tang,Kai Lu,Ruibo Wang,Zheng Wang
DOI: https://doi.org/10.48550/arXiv.2210.12230
2022-10-22
Abstract:As the hardware industry moves towards using specialized heterogeneous many-cores to avoid the effects of the power wall, software developers are finding it hard to deal with the complexity of these systems. This article shares our experience when developing a programming model and its supporting compiler and libraries for Matrix-3000, which is designed for next-generation exascale supercomputers but has a complex memory hierarchy and processor organization. To assist its software development, we developed a software stack from scratch that includes a low-level programming interface and a high-level OpenCL compiler. Our low-level programming model offers native programming support for using the bare-metal accelerators of Matrix-3000, while the high-level model allows programmers to use the OpenCL programming standard. We detail our design choices and highlight the lessons learned from developing systems software to enable the programming of bare-metal accelerators. Our programming models have been deployed to the production environment of an exascale prototype system.
Programming Languages,Distributed, Parallel, and Cluster Computing,Performance