Abstract:Since Google proposed Transformer in 2017, it has made significant natural language processing (NLP) development. However, the increasing cost is a large amount of calculation and parameters. Previous researchers designed and proposed some accelerator structures for transformer models in field-programmable gate array (FPGA) to deal with NLP tasks efficiently. Now, the development of Transformer has also affected computer vision (CV) and has rapidly surpassed convolution neural networks (CNNs) in various image tasks. And there are apparent differences between the image data used in CV and the sequence data in NLP. The details in the models contained with transformer units in these two fields are also different. The difference in terms of data brings about the problem of the locality. The difference in the model structure brings about the problem of path dependence, which is not noticed in the existing related accelerator design. Therefore, in this work, we propose the ViA, a novel vision transformer (ViT) accelerator architecture based on FPGA, to execute the transformer application efficiently and avoid the cost of these challenges. By analyzing the data structure in the ViT, we design an appropriate partition strategy to reduce the impact of data locality in the image and improve the efficiency of computation and memory access. Meanwhile, by observing the computing flow of the ViT, we use the half-layer mapping and throughput analysis to reduce the impact of path dependence caused by the shortcut mechanism and fully utilize hardware resources to execute the Transformer efficiently. Based on optimization strategies, we design two reuse processing engines with the internal stream, different from the previous overlap or stream design patterns. In the stage of the experiment, we implement the ViA architecture in Xilinx Alveo U50 FPGA and finally achieved similar to 5.2 times improvement of energy efficiency compared with NVIDIA Tesla V100, and 4-10 times improvement of performance compared with related accelerators based on FPGA, that obtained nearly 309.6 GOP/s computing performance in the peek.

HPTA: A High Performance Transformer Accelerator Based on FPGA

A Runtime-Adaptive Transformer Neural Network Accelerator on FPGAs

FTRANS: Energy-Efficient Acceleration of Transformers using FPGA

TransFRU: Efficient Deployment of Transformers on FPGA with Full Resource Utilization

HLSTransform: Energy-Efficient Llama 2 Inference on FPGAs Via High Level Synthesis

ProTEA: Programmable Transformer Encoder Acceleration on FPGA

FET-OPU: A Flexible and Efficient FPGA-Based Overlay Processor for Transformer Networks

ViA: A Novel Vision-Transformer Accelerator Based on FPGA

TATAA: Programmable Mixed-Precision Transformer Acceleration with a Transformable Arithmetic Architecture

FAMOUS: Flexible Accelerator for the Attention Mechanism of Transformer on UltraScale+ FPGAs

Hardware-friendly compression and hardware acceleration for transformer: A survey

LTrans-OPU: A Low-Latency FPGA-Based Overlay Processor for Transformer Networks

An Efficient FPGA-Based Accelerator for Swin Transformer

Enhancing Long Sequence Input Processing in FPGA-Based Transformer Accelerators Through Attention Fusion

H3D-Transformer: A Heterogeneous 3D (H3D) Computing Platform for Transformer Model Acceleration on Edge Devices

Unified Accelerator for Attention and Convolution in Inference Based on FPGA

Ayaka: A Versatile Transformer Accelerator with Low-Rank Estimation and Heterogeneous Dataflow

Fitop-Trans: Maximizing Transformer Pipeline Efficiency Through Fixed-Length Token Pruning on FPGA

A Cost-Efficient FPGA Implementation of Tiny Transformer Model using Neural ODE

FPGA-Based Vit Inference Accelerator Optimization

A Length Adaptive Algorithm-Hardware Co-design of Transformer on FPGA Through Sparse Attention and Dynamic Pipelining