BOOST: Block Minifloat-Based On-Device CNN Training Accelerator with Transfer Learning

Chuliang Guo,Binglei Lou,Xueyuan Liu,David Boland,Philip H. W. Leong,Cheng Zhuo
DOI: https://doi.org/10.1109/iccad57390.2023.10323638
2023-01-01
Abstract:Adapting CNNs to changing problems is challenging on resource-limited edge devices due to intensive computations, high precision requirements, large storage needs, and high bandwidth. This paper presents BOOST, a novel block minifloat (BM)-based parallel CNN training accelerator on memory- and computation-constrained FPGAs for transfer learning (TL). By updating a small number of layers online, BOOST enables adaptation to changing problems. Our approach utilizes a unified 8-bit BM datatype (bm(2,5) ), i.e., with a sign bit, 2 exponent bits, and 5 mantissa bits, and proposes unified Conv and dilated Conv blocks that support non-unit stride and enable task-level parallelism during back-propagation to minimize latency. For ResNet20 and VGG-like training on CIFAR-10 and SVHN datasets, BOOST achieves near 32-bit floating point accuracy, reducing latency by 21%-43% and BRAM usage by 63%-66% compared to back-propagation training without TL. Notably, BOOST outperforms the prior SOTA works to achieve perbatch throughput of 131 and 209 GOPs for ResNet20 and VGG-like respectively.
What problem does this paper attempt to address?