BSViT: A Bit-Serial Vision Transformer Accelerator Exploiting Dynamic Patch and Weight Bit-Group Quantization

Gang Wang,Siqi Cai,Wenjie Li,Dongxu Lyu,Guanghui He
DOI: https://doi.org/10.1109/tcsi.2024.3426653
2024-01-01
Abstract:Vision Transformers (ViTs) have achieved remark-able success in computer vision (CV) and are increasingly recognized as the new backbone for vision-language multi-modal tasks. Despite their success, the high computational cost associated with ViTs hinders their inference efficiency. In this paper, we introduce BSViT, a bit-serial Vision Transformer accelerator enhanced by algorithm-hardware co-design. BSViT can efficiently accelerate both plain and hierarchical Vision Transformer inference. At the algorithm level, we propose a post-training quantization scheme named dynamic patch and weight bit-group quantization. We first introduce a dynamic patch quantization (DPQ) scheme to dynamically allocate bit-width to different image patches based on their importance, thus reducing bit width and saving computation without significantly impacting accuracy. Second, we propose a weight bit-group quantization (BGQ)scheme to evenly distribute bits within groups and achieve workload balance across processing elements (PEs). At the hardware level, we propose a term-separate bit-serial accelerator to efficiently support DPQ and BGQ. We introduce dense and sparse bit-serial PEs to manipulate the dense least significant term (LST) and sparse most significant term (MST) work loads. A dense-sparse hybrid dataflow is devised to efficiently balance the two kinds of workloads. Our experiments show that BSViT can achieve up to 1.95x speedup and 2.72x energy efficiency compared to state-of-the-art (SOTA) bit-serial accelerators and achieve up to 3.69x energy efficiency compared to SOTA Trans-former accelerators.
What problem does this paper attempt to address?