Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators

Axel Stjerngren,Perry Gibson,José Cano
DOI: https://doi.org/10.48550/arXiv.2204.12418
2022-04-27
Abstract:Reconfigurable accelerators for deep neural networks (DNNs) promise to improve performance such as inference latency. STONNE is the first cycle-accurate simulator for reconfigurable DNN inference accelerators which allows for the exploration of accelerator designs and configuration space. However, preparing models for evaluation and exploring configuration space in STONNE is a manual developer-timeconsuming process, which is a barrier for research. This paper introduces Bifrost, an end-to-end framework for the evaluation and optimization of reconfigurable DNN inference accelerators. Bifrost operates as a frontend for STONNE and leverages the TVM deep learning compiler stack to parse models and automate offloading of accelerated computations. We discuss Bifrost's advantages over STONNE and other tools, and evaluate the MAERI and SIGMA architectures using Bifrost. Additionally, Bifrost introduces a module leveraging AutoTVM to efficiently explore accelerator designs and dataflow mapping space to optimize performance. This is demonstrated by tuning the MAERI architecture and generating efficient dataflow mappings for AlexNet, obtaining an average speedup of $50\times$ for the convolutional layers and $11\times$ for the fully connected layers. Our code is available at <a class="link-external link-http" href="http://www.github.com/gicLAB/bifrost" rel="external noopener nofollow">this http URL</a>.
Machine Learning,Hardware Architecture,Distributed, Parallel, and Cluster Computing,Performance
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to simplify and automate the evaluation and optimization process of reconfigurable deep neural network (DNN) inference accelerators. Specifically, in view of the problems existing in the existing tool STONNE, the paper proposes a new framework named Bifrost to improve research efficiency and reduce the need for manual operations. ### 1. Problems of the Existing Tool STONNE STONNE is a cycle - accurate simulator for simulating reconfigurable DNN accelerators, which allows researchers to explore the design space of different hardware configurations and data - flow mappings. However, there are the following problems when using STONNE for research: - **Complex Manual Operations**: Preparing models and exploring the configuration space require a great deal of manual work, such as rewriting the PyTorch model definition to adapt to STONNE. - **Limited Supported Frameworks**: STONNE only supports PyTorch models, which limits its scope of application. - **Lack of Automated Mapping Generation Tools**: Although there are some external tools (such as mRNA) that can generate optimal mappings for specific architectures, these tools are not directly integrated with STONNE, resulting in additional manual steps. ### 2. Solutions of Bifrost To overcome the above - mentioned problems, the paper proposes the Bifrost framework, and its main contributions include: - **Automated Model Preparation and Configuration Space Exploration**: By integrating TVM (an advanced machine - learning compiler framework), Bifrost can automatically parse models from multiple deep - learning frameworks (such as PyTorch, TensorFlow, ONNX, etc.), and automatically generate hardware configuration files suitable for STONNE. - **Optimized Mapping Generation**: Bifrost introduces a module based on AutoTVM, which can efficiently explore the hardware design and data - flow mapping space, thereby optimizing performance. For example, by adjusting the blocking size of the convolution layer to reduce the number of clock cycles. - **Support for More Accelerator Architectures**: Bifrost not only supports the existing MAERI and SIGMA architectures, but can also be easily extended to support new accelerator architectures. - **Improved Research Efficiency**: By automating many tedious manual steps, Bifrost greatly improves research efficiency, enabling researchers to focus more on the algorithms and architecture design themselves. ### 3. Experimental Results The paper verifies the effectiveness of Bifrost through experiments. For example, experiments on AlexNet show that Bifrost can achieve an average 50 - fold speedup on the convolution layer and an 11 - fold speedup on the fully - connected layer. ### Summary In general, this paper aims to solve the problems such as complex manual operations and limited supported frameworks when existing tools evaluate and optimize reconfigurable DNN accelerators by introducing the Bifrost framework, thereby improving research efficiency and promoting the design and development of reconfigurable DNN accelerators.