AIbench: a Tool for Benchmarking Huawei Ascend AI Processors

Yang Xiao,Zeke Wang
DOI: https://doi.org/10.1007/s42514-024-00187-x
2024-01-01
CCF Transactions on High Performance Computing
Abstract:In recent years, plenty of AI accelerators, e.g., Google TPU and Huawei Ascend, have been proposed to accelerate various Deep Learning applications, such as CNN and NLP, because AI accelerators are specialized for AI model training and inference and can thus provide higher performance per watt than GPUs. Despite the wide adoption of AI processors in the deep learning domain, the potential of AI processors is not fully harvested in the other compute-intensive domains that need massive matrix and vector operations, because AI processors typically provide custom matrix and vector instructions. A significant challenge in harnessing AI processors in other domains is the undisclosed performance characteristics of these processors. To this end, we intend to benchmark AI processors in a comprehensive approach such that programmers can easily understand the performance characteristics of AI processors that always have similar architecture. Given this, we present AIBench, a benchmarking tool designed to reveal the underlying details of an AI processor. Initially, we benchmark Huawei’s Ascend accelerator. The benchmarking results show (1) an Ascend 910 AI chip can provide 216 TFLOPs for float16 data from the matrix unit and 3390 GFLOPs from the vector unit and (2) the performance of an AI core is contingent upon the appropriate data transmission and operation mode. Utilizing unsuitable transmission modes can lead to data entry and exit times becoming a bottleneck.
What problem does this paper attempt to address?