MZ Core: an Enhanced Matrix Acceleration Engine for HPC/ AI Applications

Yasong Cao,Mei Wen,Junzhong Shen,Sheng Liu,Zhi Wang,Minjin Tang,Yahao Fang,Jianchao Yang,Renyu Yang,Yuhan Kang,Jiawei Fei
DOI: https://doi.org/10.1109/hpcc-dss-smartcity-dependsys57074.2022.00050
2022-01-01
Abstract:The convergence of High-Performance Computing (HPC) and Artificial Intelligence (AI) has become a promising trend. Due to the different computation patterns of HPC and AI applications, it's challenging to design an appropriate architecture to balance their demand. To address this, we propose Matrix Zone (MZ), an enhanced Systolic Array-based matrix engine that accelerates General Matrix Multiplication (GEMM) for both HPC and AI applications. We develop a semi-memory hierarchy to reduce on-chip area consumption and a data stitching method to support multi-precision floating-point processing efficiently. We demonstrate that MZ improves performance for both HPC and AI applications significantly. For AI-GEMM tasks, the performance of MZ is 1.80X (FP32), 15.11X (FPI6), and 3.48X (FPI6-FP32) of the TPU-like model on average, respectively. MZ's performance is more than 10.29X (FP32) and 26.37X (FPI6) of the HPC core in Convolutional (CONV) layers on average, and it is 4.66X (FP32) and 20.75X (FPI6) in Fully Connected (FC) layers on average. For HPC-GEMM tasks, the HPC core + MZ is 6.28X and 1.15X faster than that of the HPC core only and the MZ only on average, respectively. The area of MZ is 2.052 square millimeters.
What problem does this paper attempt to address?