Abstract:Reconfigurable accelerators for deep neural networks (DNNs) promise to improve performance such as inference latency. STONNE is the first cycle-accurate simulator for reconfigurable DNN inference accelerators which allows for the exploration of accelerator designs and configuration space. However, preparing models for evaluation and exploring configuration space in STONNE is a manual developer-timeconsuming process, which is a barrier for research. This paper introduces Bifrost, an end-to-end framework for the evaluation and optimization of reconfigurable DNN inference accelerators. Bifrost operates as a frontend for STONNE and leverages the TVM deep learning compiler stack to parse models and automate offloading of accelerated computations. We discuss Bifrost's advantages over STONNE and other tools, and evaluate the MAERI and SIGMA architectures using Bifrost. Additionally, Bifrost introduces a module leveraging AutoTVM to efficiently explore accelerator designs and dataflow mapping space to optimize performance. This is demonstrated by tuning the MAERI architecture and generating efficient dataflow mappings for AlexNet, obtaining an average speedup of $50\times$ for the convolutional layers and $11\times$ for the fully connected layers. Our code is available at <a class="link-external link-http" href="http://www.github.com/gicLAB/bifrost" rel="external noopener nofollow">this http URL</a>.

What problem does this paper attempt to address?

The main problem that this paper attempts to solve is to simplify and automate the evaluation and optimization process of reconfigurable deep neural network (DNN) inference accelerators. Specifically, in view of the problems existing in the existing tool STONNE, the paper proposes a new framework named Bifrost to improve research efficiency and reduce the need for manual operations. ### 1. Problems of the Existing Tool STONNE STONNE is a cycle - accurate simulator for simulating reconfigurable DNN accelerators, which allows researchers to explore the design space of different hardware configurations and data - flow mappings. However, there are the following problems when using STONNE for research: - **Complex Manual Operations**: Preparing models and exploring the configuration space require a great deal of manual work, such as rewriting the PyTorch model definition to adapt to STONNE. - **Limited Supported Frameworks**: STONNE only supports PyTorch models, which limits its scope of application. - **Lack of Automated Mapping Generation Tools**: Although there are some external tools (such as mRNA) that can generate optimal mappings for specific architectures, these tools are not directly integrated with STONNE, resulting in additional manual steps. ### 2. Solutions of Bifrost To overcome the above - mentioned problems, the paper proposes the Bifrost framework, and its main contributions include: - **Automated Model Preparation and Configuration Space Exploration**: By integrating TVM (an advanced machine - learning compiler framework), Bifrost can automatically parse models from multiple deep - learning frameworks (such as PyTorch, TensorFlow, ONNX, etc.), and automatically generate hardware configuration files suitable for STONNE. - **Optimized Mapping Generation**: Bifrost introduces a module based on AutoTVM, which can efficiently explore the hardware design and data - flow mapping space, thereby optimizing performance. For example, by adjusting the blocking size of the convolution layer to reduce the number of clock cycles. - **Support for More Accelerator Architectures**: Bifrost not only supports the existing MAERI and SIGMA architectures, but can also be easily extended to support new accelerator architectures. - **Improved Research Efficiency**: By automating many tedious manual steps, Bifrost greatly improves research efficiency, enabling researchers to focus more on the algorithms and architecture design themselves. ### 3. Experimental Results The paper verifies the effectiveness of Bifrost through experiments. For example, experiments on AlexNet show that Bifrost can achieve an average 50 - fold speedup on the convolution layer and an 11 - fold speedup on the fully - connected layer. ### Summary In general, this paper aims to solve the problems such as complex manual operations and limited supported frameworks when existing tools evaluate and optimize reconfigurable DNN accelerators by introducing the Bifrost framework, thereby improving research efficiency and promoting the design and development of reconfigurable DNN accelerators.

Bifrost: End-to-End Evaluation and Optimization of Reconfigurable DNN Accelerators

A Highly Configurable Hardware/Software Stack for DNN Inference Acceleration

Woodpecker-DL: Accelerating Deep Neural Networks via Hardware-Aware Multifaceted Optimizations

A Heterogeneous Full-stack AI Platform for Performance Monitoring and Hardware-specific Optimizations

Mitigating Edge Machine Learning Inference Bottlenecks: An Empirical Study on Accelerating Google Edge Models

Understanding Reuse, Performance, and Hardware Cost of DNN Dataflows: A Data-Centric Approach Using MAESTRO

An Open-Source ML-Based Full-Stack Optimization Framework for Machine Learning Accelerators

Automatic Generation of Fast and Accurate Performance Models for Deep Neural Network Accelerators

PALM: A Efficient Performance Simulator for Tiled Accelerators with Large-scale Model Training

Survey and design of paleozoic: a high-performance compiler tool chain for deep learning inference accelerator

Spatio-Temporal Optimization of Deep Neural Networks for Reconfigurable FPGA SoCs

DeFiNES: Enabling Fast Exploration of the Depth-first Scheduling Space for DNN Accelerators through Analytical Modeling

Apollo: Transferable Architecture Exploration

ARCO:Adaptive Multi-Agent Reinforcement Learning-Based Hardware/Software Co-Optimization Compiler for Improved Performance in DNN Accelerator Design

STI: Turbocharge NLP Inference at the Edge via Elastic Pipelining

TRIM: A Design Space Exploration Model for Deep Neural Networks Inference and Training Accelerators

Evaluation of Programming Models and Performance for Stencil Computation on Current GPU Architectures

D-STACK: High Throughput DNN Inference by Effective Multiplexing and Spatio-Temporal Scheduling of GPUs

Interstellar: Using Halide's Scheduling Language to Analyze DNN Accelerators

gem5-NVDLA: A Simulation Framework for Compiling, Scheduling and Architecture Evaluation on AI System-on-Chips