A Heterogeneous Full-stack AI Platform for Performance Monitoring and Hardware-specific Optimizations

Zikang Zhou,Chao Fu,Ruiqi Xie,Jun Han
DOI: https://doi.org/10.1109/MCSoC51149.2021.00032
2021-01-01
Abstract:Many hardware accelerators are proposed to accelerate the computation of DNN to meet the real-time application. However, constrained by the microarchitecture of accelerators, the same neural network generally will have huge performance differences when deployed on different accelerators. It forces the network designers to rethink the network structure from a hardware view. Such a designed effort is more likely to achieve better performance on the targeted accelerator. In this paper, in order to explore hardware-specific optimizations, we designed a full-stack heterogeneous evaluation platform based on the opensource neural network accelerator NVDLA and TVM with a monitoring function. This evaluation platform integrates two processors with instruction sets of Arm and RISC-V and a DNN accelerator, and DNNs under common frameworks (Pytorch, Keras, ONNX, etc.) can be deployed on the platform to analyze its adaptability to the hardware through a simple process. Based on the platform, we conduct some experiments to demonstrate how can neural network affect the performance of specific hardware design. The experimental results show that the unsuited structure of the neural networks will cause additional data transfer on the target hardware, which is the main source of performance and energy degradation. The order of network operators, the width and depth of networks, and the number of operations that are unsupported by accelerators will all affect the performance of the network on specific accelerators. Designers should do some targeted optimizations toward specific hardware deployment and NAS (Network Automatic Search) should consider these factors.
What problem does this paper attempt to address?