Machine Learning-enabled Performance Model for DNN Applications and AI Accelerator

Hong An,Mingfan Li,Tianxiang Chen,Xiaoxin Xu,Bin Zhou,Hanxi Li,Ruohan Wu,Junshi Chen,Xinghui Tian
DOI: https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys57074.2022.00038
2022-12-01
Abstract:As innovations in deep learning systems and deep neural network (DNN) models continue to grow, accurate performance analysis acts as a promising tool for understanding and navigating the complex software-hardware interplay, especially for the today's heterogeneous AI architecture. However, the actual execution of DNNs on the dedicated accelerators involves chal-lenges from nontrivial dataflow graph analysis, tensor compiler optimizations, and operator performance prediction. In this work, we propose a two-stage performance model framework that combines graph-level analysis and operator-based hotspot modeling to bridge the gap between high-level application performance and its software-hardware systems. By the employ of machine learning (ML) solution, our performance model further captures the low-level hardware-dependent information, including operator fusion and data layout transformation. Our graph analysis for mainstream models from computer vision (CV), natural language processing (NLP) and recommendation domains selects total 26 kinds of operators and builds a dataset on the Huawei Ascend 910. With the well-trained model, our open source11Source code available at https://github.com/Huawei-Performance-Model/Ascend-910b performance model finally achieves 15.4 % average error for predicting the execution time of DNN models, and our modeling for memory access and performance bottleneck supports efficient running of DNN models for future systems.
Computer Science,Engineering
What problem does this paper attempt to address?