Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures

Weiling Yang,Jianbin Fang,Dezun Dong
DOI: https://doi.org/10.1109/IPDPS49936.2021.00019
2021-01-01
Abstract:General Matrix Multiplication (GEMM) is a key subroutine in high-performance computing. There is a large body of work on evaluating and optimizing large-scale matrix multiplication, but how well the small-scale matrix multiplication (SMM) performs is largely unknown, especially for the ARMv8-based many-core architectures. In this work, we evaluate and characterize the performance of SMM subroutine...
What problem does this paper attempt to address?